On Thu, 14 Jan 1999, Ola Sigurdson wrote:
> Don't rely on software RAID on the Compaqs!!!!
>
> Unplugging active disks causes a 100 % repeatable kernel crash (total
> lockup).
> As far as I can tell it's caused by bugs in the NCR driver
>
> This is with kernel 2.0.36 & Compaq Proliant 1600.
I would suggest you the following testings:
1 - Upgrade to driver sym53c8xx-1.0a.
ftp://ftp.tux.org/roudier/896/
sym53c8xx-1.0.tar.gz (full sources to move to
linux/drivers/scsi)
+
sym53c8xx-1.0-to-1.0a.patch.gz (kernel patch)
2 - Kill everything that may prevent kernel messages from being printed
to the console if the kernel gets unable to perform disk IOs.
(killing syslogd and klogd should be enough)
3 - Run something that writes to a file-system or/and a partition
without using RAID and turn off the disk.
4 - Wait time enough for SCSI timeouts to have chance occur.
Value is 20 seconds on 2.0.35/36 kernels.
But you can decrease this value by changing SD_TIMEOUT define in
drivers/scsi/scsi.c to something like 5 seconds (5*HZ) for the
tests.
Let me know if the system locks up hard under such a creash-test (I mean
no kernel messages related to SCSI timeouts, resets and IO errors are
displayed to the console).
If it does not, perform the same testings:
5 - Using a stock driver (preferently some version > 3.1d)
6 - Using RAID.
Let me know. Thanks.
Regards,
Gerard.