Re: Newbie: What to do when a disk fails?

Lawrence Dickson Thu, 16 Sep 1999 07:57:00 -0700
My problem seemed to precede an application of raidsetfaulty (if I
understand its function), thus:
   (1) I removed a disk to simulate disk failure. The RAID did not
notice. (Even doing reads may not make it notice, presumably because
of buffering.) 
   (2) I did a dd if=/dev/md0 way out of range of any previous IO
to force it to notice. This correctly marked /proc/mdstat with an (F)
on the gone disk, but led to attempts to rebuild the array (even 
though a disk was missing) and an uninterruptible sleep for my dd
command.
   So the problem is, the disk loss is invisible until low-level IO 
is already clobbered. The RAID level seemed to be behaving right, 
according to the raidhotremove/scsi remove/scsi add/raidhotadd 
recipe, but the only result was more and more processes in the
uninterruptible sleep.
   Has anyone run into a similar sequence of problems-or found a way
around them?
   Larry  

At 10:41 PM 9/15/99 -0400, James Manning wrote:
>[ Wednesday, September 15, 1999 ] Lawrence Dickson wrote:
>>    raidhotremove seems to THINK it can work without unmounting
>> the raid array fs... same with the echo to /proc/scsi/scsi ...
>> it's really all just syncing code, isn't it, guys?
>
>I've been curious what raidsetfaulty would do (if anything) to help
>make sure the md is in the "correct" mode wrt that failed drive,
>allowing a "cleaner" raidhotremove, subsequent swap, hotadd, etc
>
>James
>-- 
>Miscellaneous Engineer --- IBM Netfinity Performance Development
>
Re: Newbie: What to do when a disk fails?

Reply via email to