My problem seemed to precede an application of raidsetfaulty (if I
understand its function), thus:
(1) I removed a disk to simulate disk failure. The RAID did not
notice. (Even doing reads may not make it notice, presumably because
of buffering.)
(2) I did a dd if=/dev/md0 way out of range of any previous IO
to force it to notice. This correctly marked /proc/mdstat with an (F)
on the gone disk, but led to attempts to rebuild the array (even
though a disk was missing) and an uninterruptible sleep for my dd
command.
So the problem is, the disk loss is invisible until low-level IO
is already clobbered. The RAID level seemed to be behaving right,
according to the raidhotremove/scsi remove/scsi add/raidhotadd
recipe, but the only result was more and more processes in the
uninterruptible sleep.
Has anyone run into a similar sequence of problems-or found a way
around them?
Larry
At 10:41 PM 9/15/99 -0400, James Manning wrote:
>[ Wednesday, September 15, 1999 ] Lawrence Dickson wrote:
>> raidhotremove seems to THINK it can work without unmounting
>> the raid array fs... same with the echo to /proc/scsi/scsi ...
>> it's really all just syncing code, isn't it, guys?
>
>I've been curious what raidsetfaulty would do (if anything) to help
>make sure the md is in the "correct" mode wrt that failed drive,
>allowing a "cleaner" raidhotremove, subsequent swap, hotadd, etc
>
>James
>--
>Miscellaneous Engineer --- IBM Netfinity Performance Development
>