Can anyone help me here? I've got an AHA1542 with a quantum XP32150W
as the only device on the chain. I keep getting errors like the following:
scsi : aborting command due to timeout : pid 198, scsi0, channel 0, id 0, lun 0
Write (10) 00 00 27 88 b0 00 00 76 00
scsi : aborting command due to timeout : pid 198, scsi0, channel 0, id 0, lun 0
Write (10) 00 00 27 88 b0 00 00 76 00
It looks like your drive failed to respond to a write request within the Linux
disk IO timeout period.
SCSI host 0 abort (pid 198) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
Sent BUS DEVICE RESET to target 0
Sending DID_RESET for target 0
Sending DID_RESET for target 0
Sending DID_RESET for target 0
Sending DID_RESET for target 0
It looks likes Linux is resetting the SCSI bus and the SCSI disk.
aha1542_intr_handle: Unexpected interrupt
tarstat=0, hastat=0 idlun=10 ccb#=5
aha1542_intr_handle: Unexpected interrupt
tarstat=0, hastat=0 idlun=10 ccb#=7
It looks like the the Adaptec SCSI card finally tried to return the disk
requests, it said the target status (the disk) and the host adapter status (the
Adaptec) were ok. Although I wouldn't always trust what an Adaptec controller
was telling me.
Too bad your logs don't give timing information. From the data given (and my
limited knowledge of how Linux disk drivers work) I'd say that Linux's timeout
on IO requests may be too short, causing a race condition between when the
status of the async disk IO is is returned and when Linux goes in to reset
mode. (Linux times out, sends the bus and device reset (which should cause this
disk to forget about any IOs) but after all of that's over, the Adaptec tries
to say the IOs completed successfully).
Last time I was writing SCSI disk drivers (about 1992) newer disks were capable
of queuing 64 commands. That number could be up substantially now. If an
average IO takes 10ms to process, 256 queued IOs could take 2.5 seconds to
process; may be bumping into Linux's SCSI device driver timeout. Just a
theory...
Then again this whole problem may be as simple as a bad SCSI cable or
termination (very common problem and usually intermittent) or disk controller
or host adapter brains going south.
I don't have the source for the SCSI disk controller, If I get a chance I'll
download some source and try to look through it to see what the timeouts are
and if they're easily configured.
Good Luck,
Al Youngwerth
[EMAIL PROTECTED]