Oh, god, I just found an awful bug in the new queueing code.  
After a medium error, we get back sense data which says which sector was
bad.  When we process this, we go ahead and do completion for the good
sectors leading up to this, and then treat that one sector as bad (all in
terms of marking buffers uptodate/not uptodate). In theory this should be
fine.

        The problem is that when we finish off the good sectors in
scsi_end_request(), this inadvertently causes the remainder to be
requeued.  At the same time, scsi_io_completion goes to work on the bad
sector, marking that as bad, and finally queueing the same command a
second time to the low-level driver (while the same command is probably
still active there). 

        All sorts of bad things happen, as you might imagine.  I was
getting hangs because the busy count for the host was off and the error
handler thread wouldn't start, I was seeing panics/oopses because the
scatter-gather list was corrupt, and I was seeing other quite spurious
behavior I couldn't categorize.

        The optimal behavior is to first mark the buffers for the good
sectors as uptodate, mark the buffers for the bad sector as not uptodate,
and then queue the remainder of the command (only once :-).

        This wasn't something that I just broke on my machine.  I have
been working on an accumulated patchset against 2.3.35 and was torturing
error handling when I ran across this one.  If the patch for this turns
out to be relatively simple (as I suspect it will be), I will include it.

        Simon - this could easily explain the bug you were seeing.  In
your case, you were having someone hold onto a lock (probably
io_request_lock) indefinitely - it is conceivable that having the same
command queued multiple times got the aic7xxx driver horribly confused.

-Eric

"The world was a library, and its books were the stones, leaves,
 brooks, grass, and the birds of the earth.   We learned to do what only
 a student of nature ever learns, and that was to feel beauty."
                        Chief Luther Standing Bear - Teton Sioux




-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to