Hi again,
I was doing further experiments with the described configuration. I added
another scsi-controller (Adaptec 3940 UW) and put one scsi-disk on the
second controller. After pulling the power from one disk, the system
fails in the same way as it fails with one controller: scsi-bus resets,
the cp-command hangs and there is no possibilty to reboot the system.
Perhaps somebody can tell me his raid1 hardware-configuration, which is
proven to work, so I can change my hardware to something more functional.
Thanks again,
daniel
On Mon, 9 Aug 1999, Daniel Seiler wrote:
> Dear friends,
>
> while evaluating the linux-raid-code for our site i had some trouble with a
> failing disk and raid1-devices. I used kernel-2.2.10 with the
> raid0145-19990724-2.2.10.gz-patch applied and raidtools-19990724-0.90.tar.gz
> on a redhat-6.0 system.
>
> My setup, described through the output of some commands:
>
> --------------------------8<-----------------8<---------------------------
>
> [root@test /root]# fdisk /dev/sda
>
> Command (m for help): p
>
> Disk /dev/sda: 255 heads, 63 sectors, 527 cylinders
> Units = cylinders of 16065 * 512 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sda1 1 255 2048256 fd Unknown
> /dev/sda2 256 259 32130 83 Linux
> /dev/sda3 260 277 144585 fd Unknown
> /dev/sda4 278 527 2008125 5 Extended
> /dev/sda5 278 405 1028128+ fd Unknown
> /dev/sda6 406 527 979933+ fd Unknown
>
> Command (m for help): q
>
> [root@test /root]# fdisk /dev/sdb
>
> Command (m for help): p
>
> Disk /dev/sdb: 255 heads, 63 sectors, 527 cylinders
> Units = cylinders of 16065 * 512 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sdb1 1 255 2048256 fd Unknown
> /dev/sdb2 256 259 32130 83 Linux
> /dev/sdb3 260 277 144585 fd Unknown
> /dev/sdb4 278 527 2008125 5 Extended
> /dev/sdb5 278 405 1028128+ fd Unknown
> /dev/sdb6 406 527 979933+ fd Unknown
>
> Command (m for help): q
>
> [root@test /root]# cat /proc/mdstat
> Personalities : [raid1]
> read_ahead 1024 sectors
> md0 : active raid1 sdb1[0] sda1[1] 2048192 blocks [2/2] [UU]
> md1 : active raid1 sdb3[0] sda3[1] 144512 blocks [2/2] [UU]
> md2 : active raid1 sdb5[0] sda5[1] 1028032 blocks [2/2] [UU]
> md3 : active raid1 sdb6[0] sda6[1] 979840 blocks [2/2] [UU]
> unused devices: <none>
> [root@test /root]# mount
> /dev/md0 on / type ext2 (rw)
> none on /proc type proc (rw)
> /dev/sdb2 on /boot type ext2 (rw)
> /dev/sda2 on /bootb type ext2 (rw)
> /dev/md3 on /home type ext2 (rw)
> /dev/md2 on /var type ext2 (rw)
> none on /dev/pts type devpts (rw,mode=0622)
> [root@test /root]#
>
> --------------------------8<-----------------8<---------------------------
>
> On startup, the kernel says something like this:
>
> --------------------------8<-----------------8<---------------------------
>
> (scsi0) <Adaptec AHA-294X Ultra SCSI host adapter> found at PCI 10/0
> (scsi0) Wide Channel, SCSI ID=7, 16/255 SCBs
> (scsi0) Downloading sequencer code... 413 instructions downloaded
> scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.17/3.2.4
> <Adaptec AHA-294X Ultra SCSI host adapter>
> scsi : 1 host.
> (scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
> Vendor: IBM Model: DCAS-34330W Rev: S61A
> Type: Direct-Access ANSI SCSI revision: 02
> Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> (scsi0:0:1:0) Synchronous at 40.0 Mbyte/sec, offset 8.
> Vendor: IBM Model: DCAS-34330W Rev: S61A
> Type: Direct-Access ANSI SCSI revision: 02
> Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
> (scsi0:0:6:0) Synchronous at 10.0 Mbyte/sec, offset 8.
> Vendor: TOSHIBA Model: CD-ROM XM-5701TA Rev: 3136
> Type: CD-ROM ANSI SCSI revision: 02
> Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0
> scsi : detected 1 SCSI cdrom 2 SCSI disks total.
>
> --------------------------8<-----------------8<---------------------------
> No problem so far, the mirrors seem to run.
> Now, i am starting a "cp -a /usr/lib /home/" or something, so the disks
> are busy and then I pull the power from on of the disks (not the one
> terminating the bus), cause I would like to test, if my system will survive a
> disk failure. Then I get the following errors (many and they don't stop):
> -------------------8<---------------8<----------------------------------
> SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 26030000
> scsidisk I/O error: dev 08:16, sector 1654798
> md: recovery thread got woken up ...
> md2: no spare disk to reconstruct array! -- continuing in degraded mode
> md3: no spare disk to reconstruct array! -- continuing in degraded mode
> md: recovery thread finished ...
> scsi0 channel 0 : resetting for second half of retries.
> SCSI bus is being reset for host 0 channel 0.
> (scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
> -------------------8<---------------8<----------------------------------
>
> The shell with the cp-command hangs after that, and from other consoles I
> can do "dmesg", but I can't do "ps aux".
>
> Perhaps someone on this list can tell what I am doing wrong. Is there some
> conceptual error in my setup? Are there known bugs in the mirror-code,
> with will cause the above seen behavior?
>
> Thanks a lot in advance,
> Daniel Seiler
>
>
>