Dear friends,
while evaluating the linux-raid-code for our site i had some trouble with a
failing disk and raid1-devices. I used kernel-2.2.10 with the
raid0145-19990724-2.2.10.gz-patch applied and raidtools-19990724-0.90.tar.gz
on a redhat-6.0 system.
My setup, described through the output of some commands:
--------------------------8<-----------------8<---------------------------
[root@test /root]# fdisk /dev/sda
Command (m for help): p
Disk /dev/sda: 255 heads, 63 sectors, 527 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 255 2048256 fd Unknown
/dev/sda2 256 259 32130 83 Linux
/dev/sda3 260 277 144585 fd Unknown
/dev/sda4 278 527 2008125 5 Extended
/dev/sda5 278 405 1028128+ fd Unknown
/dev/sda6 406 527 979933+ fd Unknown
Command (m for help): q
[root@test /root]# fdisk /dev/sdb
Command (m for help): p
Disk /dev/sdb: 255 heads, 63 sectors, 527 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 255 2048256 fd Unknown
/dev/sdb2 256 259 32130 83 Linux
/dev/sdb3 260 277 144585 fd Unknown
/dev/sdb4 278 527 2008125 5 Extended
/dev/sdb5 278 405 1028128+ fd Unknown
/dev/sdb6 406 527 979933+ fd Unknown
Command (m for help): q
[root@test /root]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 sdb1[0] sda1[1] 2048192 blocks [2/2] [UU]
md1 : active raid1 sdb3[0] sda3[1] 144512 blocks [2/2] [UU]
md2 : active raid1 sdb5[0] sda5[1] 1028032 blocks [2/2] [UU]
md3 : active raid1 sdb6[0] sda6[1] 979840 blocks [2/2] [UU]
unused devices: <none>
[root@test /root]# mount
/dev/md0 on / type ext2 (rw)
none on /proc type proc (rw)
/dev/sdb2 on /boot type ext2 (rw)
/dev/sda2 on /bootb type ext2 (rw)
/dev/md3 on /home type ext2 (rw)
/dev/md2 on /var type ext2 (rw)
none on /dev/pts type devpts (rw,mode=0622)
[root@test /root]#
--------------------------8<-----------------8<---------------------------
On startup, the kernel says something like this:
--------------------------8<-----------------8<---------------------------
(scsi0) <Adaptec AHA-294X Ultra SCSI host adapter> found at PCI 10/0
(scsi0) Wide Channel, SCSI ID=7, 16/255 SCBs
(scsi0) Downloading sequencer code... 413 instructions downloaded
scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.17/3.2.4
<Adaptec AHA-294X Ultra SCSI host adapter>
scsi : 1 host.
(scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
Vendor: IBM Model: DCAS-34330W Rev: S61A
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
(scsi0:0:1:0) Synchronous at 40.0 Mbyte/sec, offset 8.
Vendor: IBM Model: DCAS-34330W Rev: S61A
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
(scsi0:0:6:0) Synchronous at 10.0 Mbyte/sec, offset 8.
Vendor: TOSHIBA Model: CD-ROM XM-5701TA Rev: 3136
Type: CD-ROM ANSI SCSI revision: 02
Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0
scsi : detected 1 SCSI cdrom 2 SCSI disks total.
--------------------------8<-----------------8<---------------------------
No problem so far, the mirrors seem to run.
Now, i am starting a "cp -a /usr/lib /home/" or something, so the disks
are busy and then I pull the power from on of the disks (not the one
terminating the bus), cause I would like to test, if my system will survive a
disk failure. Then I get the following errors (many and they don't stop):
-------------------8<---------------8<----------------------------------
SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 26030000
scsidisk I/O error: dev 08:16, sector 1654798
md: recovery thread got woken up ...
md2: no spare disk to reconstruct array! -- continuing in degraded mode
md3: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
scsi0 channel 0 : resetting for second half of retries.
SCSI bus is being reset for host 0 channel 0.
(scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
-------------------8<---------------8<----------------------------------
The shell with the cp-command hangs after that, and from other consoles I
can do "dmesg", but I can't do "ps aux".
Perhaps someone on this list can tell what I am doing wrong. Is there some
conceptual error in my setup? Are there known bugs in the mirror-code,
with will cause the above seen behavior?
Thanks a lot in advance,
Daniel Seiler