I've run into a major problem with a development system here
(K6-III400/128MB). It
has an Initio-9100UW which has a Quantum Atlas IV (ID0) on the internal
connector and
a Seagate ST423451WEXT (ID10) on the external connector. Everything is
terminated
properly (term jumper on the Quantum, terminator on the other connector for
the Seagate's enclosure; all devices recognized quickly by the card). The
card sees everything just fine, as does Linux.

After connecting the drives, I did some quick testing with dd and found
everything working fine. However, I found a problem:

This works:
dd if=/dev/zero of=/quantum_internal_mountpoint/temp.dat bs=1024
count=10240; dd if=/dev/zero of=/seagate_external_mountpoint/temp.dat
bs=1024 count=10240

This will cause the system to die:

dd if=/dev/zero of=/quantum_internal_mountpoint/temp.dat bs=1024 count=10240
&
dd if=/dev/zero of=/seagate_external_mountpoint/temp.dat bs=1024 count=10240

(Note: I discovered this while starting a few copies of log analysis program
for a bunch of client sites, which causes disk activity on the internal
drive (temp files) and the external drive (where several gigabytes of log
files are stored). The simultaneous access appears to be the problem & the
dd work above is the fast way to reproduce it.)

It starts with something like this:
scsi : aborting command due to timeout : pid 44135, scsi0, channel 0, id 0,
lun 0 Write (6) 18 ca b7 08 00
scsi : aborting command due to timeout : pid 44136, scsi0, channel 0, id 10,
lun 0 Write (6) 00 67 07 08 00

The next thing that happens is a bus reset which fails, leading to "trying
harder" and a complete hang. (No response on the console, drops off the
network, etc.)

I've tried:
    2.2.14 - from Redhat 6.2
    2.2.16 - Redhat update RPM
    2.2.16 - built from kernel source

    2.4.0-test4 - built from source

As expected, there's no discernable difference when the Initio driver is
compiled in to the kernel instead of being loaded as a module.

2.4.0-test4 seems to be more resistant to the crash but it still goes down
even though it takes a little longer.

I'm going to attempt to see if I can a) throttle back the disk activity by
that log analysis program and b) make sure that all of the temp files it
uses are on the same drive, as single-drive I/O doesn't seem to trigger the
problem.

Any ideas on a better solution?


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to