Re: [CentOS] Race condition with mdadm at boot [still mystifying]

2011-03-12 Thread compdoc
On the particular Supermicro motherboard I'm using, there is a very
long delay (10 or 15 sec) between power-on and initiation of visible
BIOS activity, so all disk drives have ample time to spin up and stabilize.


Yeah, I have used Supermicro in the past and they had the same long pause
when you turn them on. Good boards, except I had one die recently.

I was wondering how many drives total, and how many watts the PSU is?

Also, is the controller's firmware up to date?




___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Race condition with mdadm at boot [still mystifying]

2011-03-11 Thread Chuck Munro

On 03/11/2011 09:00 AM, Les Mikesell wrote:

 On 3/10/11 9:25 PM, Chuck Munro wrote:

   However, on close examination of dmesg, I found something very
   interesting.  There were missing 'bindsd??' statements for one or the
   other hot spare drive (or sometimes both).  These drives are connected
   to the last PHYs in each SATA controller ... in other words they are the
   last devices probed by the driver for a particular controller.  It would
   appear that the drivers are bailing out before managing to enumerate all
   of the partitions on the last drive in a group, and missing partitions
   occur quite randomly.
 
   So it may or may not be a timing issue between the WD Caviar Black
   drives and both the LSI and Marvell SAS/SATA controller chips.
 I've seen some weirdness in powering up 6 or more SATA drives but never
 completely pinned down whether it was the controller, drive cage, or 
 particular
 drives causing the problem.  But I think my symptom was completely failing to
 detect some drives when certain combinations of disks were installed although
 each would work individually.  Do you have any options about whether they 
 power
 up immediately or wait until accessed?

That's a good question, one I have experimented with.  I don't have any 
choice as to when the drives are spun up (only on bootup), but I did try 
a controller card which pre-spun and checked the identification of the 
drives before handing off to the BIOS for bootup.  That didn't help.

On the particular Supermicro motherboard I'm using, there is a very long 
delay (10 or 15 sec) between power-on and initiation of visible BIOS 
activity, so all disk drives have ample time to spin up and stabilize. 
The drives' SMART data shows that the average spin-up time is well 
within the BIOS startup delay.  Each drive activity indicator shows that 
they are always probed by the kernel's scsi scan process.

I have since tried a couple of other tricks I found by Googling around 
... setting the kernel parameters 'rootdelay=xx' and 
'scsi_mod.scan=sync'.  These had no effect on the problem.  For some 
unfathomable reason, the last drives in each group of drives have one or 
more random partitions missing, with no 'bind' statement in dmesg. 
Other partitions on those drives are bound normally.  This has been 
tested with at least two known-good replacement drives, with the same 
random results.  On two occasions today, everything worked perfectly, 
but that was unusual.

A friend of mine suggested an ugly hack - connect two 'dummy' unused old 
SATA drives to the last port of each controller (I'm using only 6 of 8 
on each).  I wonder if one of those $15 IDE-to-SATA converters would do 
the job (without a drive attached)?  Foolish thought  :-/

Chuck
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Race condition with mdadm at boot [still mystifying]

2011-03-10 Thread Les Mikesell
On 3/10/11 9:25 PM, Chuck Munro wrote:

 However, on close examination of dmesg, I found something very
 interesting.  There were missing 'bindsd??' statements for one or the
 other hot spare drive (or sometimes both).  These drives are connected
 to the last PHYs in each SATA controller ... in other words they are the
 last devices probed by the driver for a particular controller.  It would
 appear that the drivers are bailing out before managing to enumerate all
 of the partitions on the last drive in a group, and missing partitions
 occur quite randomly.

 So it may or may not be a timing issue between the WD Caviar Black
 drives and both the LSI and Marvell SAS/SATA controller chips.

I've seen some weirdness in powering up 6 or more SATA drives but never 
completely pinned down whether it was the controller, drive cage, or particular 
drives causing the problem.  But I think my symptom was completely failing to 
detect some drives when certain combinations of disks were installed although 
each would work individually.  Do you have any options about whether they power 
up immediately or wait until accessed?

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos