Problems with raid1 -- failed disk is not removed from md

Daniel Seiler Tue, 10 Aug 1999 06:17:55 -0700
Dear friends,

while evaluating the linux-raid-code for our site i had some trouble with a 
failing disk and raid1-devices. I used kernel-2.2.10 with the
raid0145-19990724-2.2.10.gz-patch applied and raidtools-19990724-0.90.tar.gz
on a redhat-6.0 system.

My setup, described through the output of some commands:

--------------------------8<-----------------8<---------------------------

[root@test /root]# fdisk /dev/sda

Command (m for help): p

Disk /dev/sda: 255 heads, 63 sectors, 527 cylinders       
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1       255   2048256   fd  Unknown
/dev/sda2           256       259     32130   83  Linux
/dev/sda3           260       277    144585   fd  Unknown
/dev/sda4           278       527   2008125    5  Extended
/dev/sda5           278       405   1028128+  fd  Unknown
/dev/sda6           406       527    979933+  fd  Unknown

Command (m for help): q

[root@test /root]# fdisk /dev/sdb

Command (m for help): p

Disk /dev/sdb: 255 heads, 63 sectors, 527 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1       255   2048256   fd  Unknown
/dev/sdb2           256       259     32130   83  Linux
/dev/sdb3           260       277    144585   fd  Unknown
/dev/sdb4           278       527   2008125    5  Extended
/dev/sdb5           278       405   1028128+  fd  Unknown
/dev/sdb6           406       527    979933+  fd  Unknown

Command (m for help): q

[root@test /root]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 sdb1[0] sda1[1] 2048192 blocks [2/2] [UU]
md1 : active raid1 sdb3[0] sda3[1] 144512 blocks [2/2] [UU]
md2 : active raid1 sdb5[0] sda5[1] 1028032 blocks [2/2] [UU]
md3 : active raid1 sdb6[0] sda6[1] 979840 blocks [2/2] [UU]
unused devices: <none>
[root@test /root]#  mount
/dev/md0 on / type ext2 (rw)
none on /proc type proc (rw)
/dev/sdb2 on /boot type ext2 (rw)
/dev/sda2 on /bootb type ext2 (rw)
/dev/md3 on /home type ext2 (rw)
/dev/md2 on /var type ext2 (rw)
none on /dev/pts type devpts (rw,mode=0622)
[root@test /root]#

--------------------------8<-----------------8<---------------------------

On startup, the kernel says something like this:

--------------------------8<-----------------8<---------------------------

(scsi0) <Adaptec AHA-294X Ultra SCSI host adapter> found at PCI 10/0
(scsi0) Wide Channel, SCSI ID=7, 16/255 SCBs
(scsi0) Downloading sequencer code... 413 instructions downloaded
scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.17/3.2.4
       <Adaptec AHA-294X Ultra SCSI host adapter>
scsi : 1 host.
(scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
  Vendor: IBM       Model: DCAS-34330W       Rev: S61A
  Type:   Direct-Access                      ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
(scsi0:0:1:0) Synchronous at 40.0 Mbyte/sec, offset 8.
  Vendor: IBM       Model: DCAS-34330W       Rev: S61A
  Type:   Direct-Access                      ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
(scsi0:0:6:0) Synchronous at 10.0 Mbyte/sec, offset 8.
  Vendor: TOSHIBA   Model: CD-ROM XM-5701TA  Rev: 3136
  Type:   CD-ROM                             ANSI SCSI revision: 02
Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0
scsi : detected 1 SCSI cdrom 2 SCSI disks total.

--------------------------8<-----------------8<---------------------------
No problem so far, the mirrors seem to run.
Now, i am starting a "cp -a /usr/lib /home/" or something, so the disks 
are busy and then I pull the power from on of the disks (not the one 
terminating the bus), cause I would like to test, if my system will survive a
disk failure. Then I get the following errors (many and they don't stop):
-------------------8<---------------8<----------------------------------
SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 26030000
scsidisk I/O error: dev 08:16, sector 1654798
md: recovery thread got woken up ...
md2: no spare disk to reconstruct array! -- continuing in degraded mode
md3: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
scsi0 channel 0 : resetting for second half of retries.
SCSI bus is being reset for host 0 channel 0.
(scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
-------------------8<---------------8<----------------------------------

The shell with the cp-command hangs after that, and from other consoles I
can do "dmesg", but I can't do "ps aux".

Perhaps someone on this list can tell what I am doing wrong. Is there some
conceptual error in my setup? Are there known bugs in the mirror-code, 
with will cause the above seen behavior?

Thanks a lot in advance,
                        Daniel Seiler
Problems with raid1 -- failed disk is not removed from md

Reply via email to