Re: when is a disk non-fresh?

2008-02-10 Thread David Greaves
Dexter Filmore wrote:
 On Friday 08 February 2008 00:22:36 Neil Brown wrote:
 On Thursday February 7, [EMAIL PROTECTED] wrote:
 On Tuesday 05 February 2008 03:02:00 Neil Brown wrote:
 On Monday February 4, [EMAIL PROTECTED] wrote:
 Seems the other topic wasn't quite clear...
 not necessarily.  sometimes it helps to repeat your question.  there
 is a lot of noise on the internet and somethings important things get
 missed... :-)

 Occasionally a disk is kicked for being non-fresh - what does this
 mean and what causes it?
 The 'event' count is too small.
 Every event that happens on an array causes the event count to be
 incremented.
 An 'event' here is any atomic action? Like write byte there or calc
 XOR?
 An 'event' is
- switch from clean to dirty
- switch from dirty to clean
- a device fails
- a spare finishes recovery
 things like that.
 
 Is there a glossary that explains dirty and such in detail?

Not yet.

http://linux-raid.osdl.org/index.php?title=Glossary

David
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: when is a disk non-fresh?

2008-02-08 Thread Dexter Filmore
On Friday 08 February 2008 00:22:36 Neil Brown wrote:
 On Thursday February 7, [EMAIL PROTECTED] wrote:
  On Tuesday 05 February 2008 03:02:00 Neil Brown wrote:
   On Monday February 4, [EMAIL PROTECTED] wrote:
Seems the other topic wasn't quite clear...
  
   not necessarily.  sometimes it helps to repeat your question.  there
   is a lot of noise on the internet and somethings important things get
   missed... :-)
  
Occasionally a disk is kicked for being non-fresh - what does this
mean and what causes it?
  
   The 'event' count is too small.
   Every event that happens on an array causes the event count to be
   incremented.
 
  An 'event' here is any atomic action? Like write byte there or calc
  XOR?

 An 'event' is
- switch from clean to dirty
- switch from dirty to clean
- a device fails
- a spare finishes recovery
 things like that.

Is there a glossary that explains dirty and such in detail?


   If the event counts on different devices differ by more than 1, then
   the smaller number is 'non-fresh'.
  
   You need to look to the kernel logs of when the array was previously
   shut down to figure out why it is now non-fresh.
 
  The kernel logs show absolutely nothing. Log's fine, next time I boot up,
  one disk is kicked, I got no clue why, badblocks is fine, smartctl is
  fine, selft test fine, dmesg and /var/log/messages show nothing apart
  from that news that the disk was kicked and mdadm -E doesn't say anything
  suspicious either.

 Can you get mdadm -E on all devices *before* attempting to assemble
 the array?


Yes, can do. But now the array is in sync again, guess you want an -E scan 
when it's degraded?


  Question: what events occured on the 3 other disks that didn't occur on
  the last? It only happens after reboots, not while the machine is up so
  the closest assumption is that the array is not properly shut down
  somehow during system shutdown - only I wouldn't know why.

 Yes, most likely is that the array didn't shut down properly.

I noticed that *after* stoppping the array I get some message on the console 
about SCSI caches, but it disappeares too quickly to read and doesn't turn up 
in logs. Will try and video shoot it tho I issue sync anyway before 
stopping the array.


  Box is Slackware 11.0, 11 doesn't come with raid script of its own so I
  hacked them into the boot scripts myself and carefully watched that
  everything accessing the array is down before mdadm --stop --scan is
  issued. No NFS, no Samba, no other funny daemons, disks are synced and so
  on.
 
  I could write some failsafe inot it by checking if the event count is the
  same on all disks before --stop, but even if it wasn't, I really wouldn't
  know what to do about it.
 
  (btw mdadm -E gives me: Events : 0.1149316 - what's with the 0. ?)

 The events count is a 64bit number and for historical reasons it is
 printed as 2 32bit numbers.  I agree this is ugly.

 NeilBrown
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+++ L+++ E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h++ r* y?
--END GEEK CODE BLOCK--

http://www.vorratsdatenspeicherung.de
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: when is a disk non-fresh?

2008-02-07 Thread Dexter Filmore
On Tuesday 05 February 2008 03:02:00 Neil Brown wrote:
 On Monday February 4, [EMAIL PROTECTED] wrote:
  Seems the other topic wasn't quite clear...

 not necessarily.  sometimes it helps to repeat your question.  there
 is a lot of noise on the internet and somethings important things get
 missed... :-)

  Occasionally a disk is kicked for being non-fresh - what does this mean
  and what causes it?

 The 'event' count is too small.
 Every event that happens on an array causes the event count to be
 incremented.

An 'event' here is any atomic action? Like write byte there or calc XOR?


 If the event counts on different devices differ by more than 1, then
 the smaller number is 'non-fresh'.

 You need to look to the kernel logs of when the array was previously
 shut down to figure out why it is now non-fresh.

The kernel logs show absolutely nothing. Log's fine, next time I boot up, one 
disk is kicked, I got no clue why, badblocks is fine, smartctl is fine, selft 
test fine, dmesg and /var/log/messages show nothing apart from that news that 
the disk was kicked and mdadm -E doesn't say anything suspicious either.

Question: what events occured on the 3 other disks that didn't occur on the 
last? It only happens after reboots, not while the machine is up so the 
closest assumption is that the array is not properly shut down somehow during 
system shutdown - only I wouldn't know why.
Box is Slackware 11.0, 11 doesn't come with raid script of its own so I hacked 
them into the boot scripts myself and carefully watched that everything 
accessing the array is down before mdadm --stop --scan is issued.
No NFS, no Samba, no other funny daemons, disks are synced and so on.

I could write some failsafe inot it by checking if the event count is the same 
on all disks before --stop, but even if it wasn't, I really wouldn't know 
what to do about it.

(btw mdadm -E gives me: Events : 0.1149316 - what's with the 0. ?)

Dex



-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+++ L+++ E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h++ r* y?
--END GEEK CODE BLOCK--

http://www.vorratsdatenspeicherung.de
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: when is a disk non-fresh?

2008-02-07 Thread Neil Brown
On Thursday February 7, [EMAIL PROTECTED] wrote:
 On Tuesday 05 February 2008 03:02:00 Neil Brown wrote:
  On Monday February 4, [EMAIL PROTECTED] wrote:
   Seems the other topic wasn't quite clear...
 
  not necessarily.  sometimes it helps to repeat your question.  there
  is a lot of noise on the internet and somethings important things get
  missed... :-)
 
   Occasionally a disk is kicked for being non-fresh - what does this mean
   and what causes it?
 
  The 'event' count is too small.
  Every event that happens on an array causes the event count to be
  incremented.
 
 An 'event' here is any atomic action? Like write byte there or calc XOR?

An 'event' is
   - switch from clean to dirty
   - switch from dirty to clean
   - a device fails
   - a spare finishes recovery
things like that.

 
 
  If the event counts on different devices differ by more than 1, then
  the smaller number is 'non-fresh'.
 
  You need to look to the kernel logs of when the array was previously
  shut down to figure out why it is now non-fresh.
 
 The kernel logs show absolutely nothing. Log's fine, next time I boot up, one 
 disk is kicked, I got no clue why, badblocks is fine, smartctl is fine, selft 
 test fine, dmesg and /var/log/messages show nothing apart from that news that 
 the disk was kicked and mdadm -E doesn't say anything suspicious either.

Can you get mdadm -E on all devices *before* attempting to assemble
the array?

 
 Question: what events occured on the 3 other disks that didn't occur on the 
 last? It only happens after reboots, not while the machine is up so the 
 closest assumption is that the array is not properly shut down somehow during 
 system shutdown - only I wouldn't know why.

Yes, most likely is that the array didn't shut down properly.

 Box is Slackware 11.0, 11 doesn't come with raid script of its own so I 
 hacked 
 them into the boot scripts myself and carefully watched that everything 
 accessing the array is down before mdadm --stop --scan is issued.
 No NFS, no Samba, no other funny daemons, disks are synced and so on.
 
 I could write some failsafe inot it by checking if the event count is the 
 same 
 on all disks before --stop, but even if it wasn't, I really wouldn't know 
 what to do about it.
 
 (btw mdadm -E gives me: Events : 0.1149316 - what's with the 0. ?)
 

The events count is a 64bit number and for historical reasons it is
printed as 2 32bit numbers.  I agree this is ugly.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: when is a disk non-fresh?

2008-02-04 Thread Neil Brown
On Monday February 4, [EMAIL PROTECTED] wrote:
 Seems the other topic wasn't quite clear...

not necessarily.  sometimes it helps to repeat your question.  there
is a lot of noise on the internet and somethings important things get
missed... :-)

 Occasionally a disk is kicked for being non-fresh - what does this mean and 
 what causes it?

The 'event' count is too small.  
Every event that happens on an array causes the event count to be
incremented.
If the event counts on different devices differ by more than 1, then
the smaller number is 'non-fresh'.

You need to look to the kernel logs of when the array was previously
shut down to figure out why it is now non-fresh.

NeilBrown


 
 Dex
 
 
 
 -- 
 -BEGIN GEEK CODE BLOCK-
 Version: 3.12
 GCS d--(+)@ s-:+ a- C UL++ P+++ L+++ E-- W++ N o? K-
 w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
 b++(+++) DI+++ D- G++ e* h++ r* y?
 --END GEEK CODE BLOCK--
 
 http://www.vorratsdatenspeicherung.de
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html