Hi. I am trying to set up a RAID array with a bunch of surplus
SCSI disks. The disks are Seagate ST15150N fast/narrow 4GB disks out of an
older HP array. The problem that I am having is that the disks are
formatted with 520 byte sectors. I would like to use the scsiinfo program
with the SCSI generic devices to change the sector size to 512 bytes and
then low-level format the disks so that they can be used with the Linux
SCSI disk driver. Unfortunately, the Linux kernel crashes after it scans
the SCSI bus with the 520 sector disks on it.

System info: I have several identical systems with Intel PR440FX dual PPro
motherboards and on-board ultra/wide Adaptec SCSI controllers. The systems
boot and run off hard disks connected to the Adaptec SCSI bus.

For the RAID array, I am trying to use a dual-channel Mylex FlashPoint DL
PCI SCSI controller (BT-932).

At first, the machine would hang instantly after the 520-byte sector SCSI
disks were recognized. Eventually, I was able to get it to print out an
oops by compiling a non-SMP 2.2.10 kernel with the BusLogic driver
statically linked:

        http://www-personal.engin.umich.edu/~wingc/scsicrash/ksymoops.txt

Trying to load the BusLogic driver after the machine has booted normally
(and therefore scan the SCSI bus with the 520-byte sector drives on it)
yields the following result:

        http://www-personal.engin.umich.edu/~wingc/scsicrash/2.2-crash.txt

And then the machine dies. If I load the Buslogic module with the machine
in runlevel 3 and other processes going, the machine dies completely. I
cannot switch VTs or scroll back with shift-pgup/pgdn.

I have tried the following combinations of hardware and software with
identical results: (machine locks solid, no oops)

        Linux 2.2.5 SMP as patched in Red Hat 6.0 (i386), BusLogic
                compiled as module.
                520 byte sector disks attached to BT-932 SCSI controller

        Linux 2.2.5 SMP as patched in Red Hat 6.0 (i386), aic7xxx
                compiled as module.
                520 byte sector disks attached to Adaptec controller
                integrated on Intel motherboard.

        Linux 2.2.10 SMP vanilla from kernel.org, BusLogic compiled as
                module.
                520 byte sector disks attached to BT-932 SCSI controller.

        Linux 2.2.10 SMP vanilla from kernel.org, BusLogic compiled into
                kernel statically.
                520 byte sector disks attached to BT-932 SCSI controller.

I also tried it with Linux 2.0.36 as patched in Red Hat 5.2 (uniprocessor)
with the BT-932 controller and BusLogic compiled as a module. It still
crashes, but a little more information is printed to the screen:

        http://www-personal.engin.umich.edu/~wingc/scsicrash/2.0-crash.txt

If the machine is not in runlevel 3 when the SCSI driver initializes (i.e.
if I've brought the machine to single-user mode and unmounted everything
possible, or if the SCSI driver is compiled into the kernel and
initializes as part of the normal kernel initialization), then the machine
will only die mostly; I can change VTs and scroll back, but otherwise the
machine is dead.

I played around with drivers/scsi/sd.c a little bit and found the code
where disks with sector sizes other than 248, 512, 1024, or 2048 are
automatically removed from the list of valid SCSI disks. Based on the
output of ksymoops, I'm guessing that somehow there is I/O waiting to be
completed on the disk after it has been 'deleted'. It seems that something
is calling do_sd_request() (from an interrupt bottom half?), but this
fails because sd_init_onedisk() has removed the pointer in
rscsi_disks[].device.

I put in a small assertion in do_sd_request() and found out that,
indeed, a NULL pointer was being passed to do_sd_request in
rscsi_disks[].device:

        http://www-personal.engin.umich.edu/~wingc/scsicrash/patch1.txt

I got "do_sd_request(): disk 0 device was NULL" as expected, and the
machine still locked up; presumably because the patch didn't dequeue the
I/O, but instead prevented it from ever completing.

Next, I tried rearranging the code in sd_init_onedisk(), thinking that
perhaps this might let the I/O complete:

        http://www-personal.engin.umich.edu/~wingc/scsicrash/patch2.txt

With this patch, the bootup gets further:

        http://www-personal.engin.umich.edu/~wingc/scsicrash/test-crash-2.2.txt

and then hangs. So, it seems that even though sd_init_onedisk claims to
have deleted the disk device, it still sticks around somewhere because it
is being scanned in check_partition() from drivers/block/genhd.c.


I have checked DejaNews as well as the last 3 years' archives of this
mailing list, and while this problem has been mentioned a few times, I
have not seen any responses or solutions mentioned.

Ideally, I would like to be able to use the scsiinfo program with the SCSI
generic devices to try and change the sector size back to 512 bytes and
then do a low-level format on the drives. Is it possible to fix the SCSI
system so that I can at least use the SCSI generic devices to access the
disks, even though they are not used as SCSI disk devices?

I'm not really familiar enough with what's going on in the kernel to see
the obvious problem or solution. I'm willing to work debugging/fixing the
problem but I thought I'd send this message out in case someone might have
a clue to toss my way.


Thank you very much,
Chris Wing

[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to