[Wishart, Aaron M. (James Tower)]
> I have a raid5 file system consisting of 8, 9-gig quantum scsi drives (scsi
> id 0-6, 8). The drive with the scsi id of 1 failed. I replaced the drive
> and ran "raidhotadd /dev/scb /dev/md0" It appeared to run so I left for the
> weekend. When I came in this morning the syslogd was using 75% of the cpu
> and outputting "kernel: raid5: md0: unrecoverable error I/O error for block
> #####" from some kind of loop it apparently failed around 4:00am Saturday (
> I started the restore at about 4:00 Friday afternoon).
- you'd typically do something like "raidhotadd /dev/md0 /dev/sdb1"
instead, after replacing the disk, making sure it came back as sdb
(as per kernel log), fdisk'ing to make a partition with type fd (no,
not 100% necessary, but almost always a good idea) then doing the
raidhotadd.
- After the raidhotadd you'd check /proc/mdstat to confirm the array
is reconstructing on the new drive (partition, really).
Aside from those two (which I don't think is really the issue, but worth
clarifying), I'd say there's the possibility that another drive gave an
error (maybe a soft error, the raid code doesn't really differentiate and
can get quite picky even if the underlying drive successfully remapped
the sector) without the resync completed (resync's seem to take much
longer than they should, but maybe that's just me... I mirror entire
drives in 20 minutes, but resync's seem to take over a dozen hours)
Good luck,
James