On Monday March 19, [EMAIL PROTECTED] wrote:
> Hi,
> 
> I have a RAID setup, 3 Compaq 4Gb drives running of an Adaptec2940UW. 
> Kernel 2.2.18 with RAID-patches etc.
> 
> I have been trying out various options, doing some stress-testing etc.,
> and I have now arrived at the following situation that I cannot explain:
> 
> when running the 3 drives in a RAID5 config, one of the drives (always the
> same one) will always fail in during heavy IO or during a resync phase. It
> appears to produce one IO error (judging from messages in the log), upon
> which it is promptly removed from the array.
> I can then hotremove the failing drive, then hotadd it - and resync starts, 
> and quite often completes. This scenario is consistently repeatable.

During the initial resync phase, the data blocks are read and the
parity blocks are written -- across all drives.
During a rebuild-after-failure, data and paritiy are read from a
"good" drives and data-or-parity is written to the spare drive.

This could lead to differrent patterns of concurrent access.  In
particular, duing the resync that you say often completes, the
questionable drive is only being written to.  During the resync that
usually fails, the questionable drive is often being read concurently
with other drives.

> 
> So, it would seem that this one drive has a hardware problem. So I ran badblocks
> with write-check on it, couple of times - came out 100% clean.
> I then built a RAID0 array instead - and started driving lots of IO on it - 
> it's still running - not a problem. Filled up the array, still no probs.
> 
> So, except when the drive is in a RAID5 config, it seems ok. 

Well, raid5 would do about 30% more IO when writing.  It certainly
sounds odd, but it could be some combinatorial thing..

> 
> Any suggestions ? I would like to confirm whether or whether not the
> drive has a problem. 

Try re-arranging the drives on the scsi chain.  If the questionable
one is currently furthest from the host-adapter, make it closest.  See
if that has any effect.
It could well be cabling, or terminators or something.  Or it could be
the drive.

NeilBrown

> 
> thanks,
> Per Jessen
> 
> 
> 
> 
> 
> regards,
> Per Jessen
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]

Reply via email to