Re: wait after scsi_reset

Kurt Garloff Sat, 5 Feb 2000 09:10:39 -0800
On Wed, Feb 02, 2000 at 10:00:40PM -0500, Eric Youngdale wrote:
> >And new commands?
> 
>     There should be no new commands issued either.  That was one of the
> whole points in redoing the error handling code - any attempt to correct a
> condition while commands were still coming in was bound to cause further
> problems.

OK.

> >I'd prefer to avoid this. I'll have to unlock, but this always risks a
> >deadlock, if sb. else grabs the io_lock from an IRQ handler. (If it's
> >properly implemented, it won't give a problem, I believe.)
> 
>     If there are issues here, we need to work through them, I think.  The
> error handler thread is not running from the context of an interrupt
> handler, so from the point of view of the mid-level it should be safe.

Good.

> Given that there are no new commands being issued, the driver itself
> shouldn't be re-entered via queuecommand, and given that a single instance
> of the error handler thread calls the other functions you needn't worry
> about re-entrancy in the error handling functions.  About the only place
> where you could become re-entrant is through the interrupt service routine
> for the driver itself.

Thinking about it, it can be safe, yes!
I still don't like to udelay for a couple of seconds. It's a waste of CPU
cycles. udelay is smth which should only be used up to a few milliseconds,
IMHO.

>     You could theoretically become re-entrant through one of the trivial
> functions like the biosparam interface, but that isn't likely to cause any
> problems.
> 
> >Anyway, what happens, if I detect a RESET on the bus (issued by another
> >initiator or a device)? I also want to prevent the issueing of commands for
> >a second or so, then.
> 
>     This is a possibility that I guess I never really considered.

I can tell you that it happens regularily in my box. The reason for this is
simple: I have three controllers on one chain for testing purposes.

> Yes, a
> reset can originate from a number of different places, the question is
> exactly how this is relevant to the rest of the system.   There is support
> in there for keeping track of resets that originate from the mid-level
> itself.  This has a couple of consequences - if the reset originates in the
> mid-level we set a flag that says we are expecting a
> CHECK_CONDITION/UNIT_ATTENTION, and when we get it, we automatically retry
> the command.  Later on we re-lock the door for removable media.

Good.

>     If the reset originates elsewhere, then the
> CHECK_CONDITION/UNIT_ATTENTION will arrive unexpectedly, and the unit
> attention will be passed up for further processing, where it is assumed that
> a media change took place.

Which will result in buffers to be flushed, which is nothing catastrophic.
Oh wait, what about dirty buffers, that still need to be written to disk?

> >So, I imagine a solution, where I just don't allow commands being queued
> >with queuecommand in that period of time.
> >Is returning with 1 fine for new EH. Will it resent the commands later
> then?
> >Or is it better calling scsi_done with DID_RESET for all new commands?
> >Should the latter also work with old EH?
> 
>     Returning 1 from the queuecommand function will work to a point, but the
> mid-level won't start queueing new commands until one of the existing
> commands that is on the bus finishes.  If the bus was otherwise idle and a
> reset was detected, then there would be no mechanism to restart.  

Can't you set a timer for this? I would like the mid-level code make sure
that commands in its queue never starve. If no outstanding commands are
there, it's its responsability to set a timer to try again after some time
(a second or so), IMHO. This can happen in normal operation, too, I think.
The driver is not ready to accept a command (for whatever reason) and
returns 1 therefore. What does the midlayer do, if there no more outstanding
commands?

> With the old error handling code the return value of queuecommand is not
> significant (i.e. it is ignored).

Yes.

>     Returning with DID_RESET is possible, I guess.  Offhand I am not sure
> what would happen if you were to do this.  I would need to investigate if
> you were interested.

I would like you to look into this.

What about ordering of the commands? We need to maintain it in presence of
writes and reads to the same block. Are the commands aborted with DID_RESET
issued again in the same order to the driver? The ones returned to mid-level
from queuecommand with exit 1 (new EH)? If no, exception handling is very
unsafe! if yes, the driver then could send all commands in its queues back
to the mid-level with DID_RESET, such that all could be resend in the
right order.
(This could still be unsafe when using tagged command queueing, though, but
 also the devices need to monitor the data integrity with Tagged Cmd
 Queueing, so that this might not even be an issue.)

>     For what it is worth, I have been toying with the idea of providing a
> simple API function that low-level drivers could use to disable further
> commands from being queued, and obviously to provide a second one to
> re-enable them.  It sounds like such a thing would be just what you would
> need to solve this problem.
> 
>     I suppose that a 3rd API function would also be handy where you could
> report the detection of a reset that originated from elsewhere.  This would
> make it possible for the mid-level to treat it properly.

Yes, I agree with both suggestions.

More importantly, however, I would like you to make sure the ML queues don't
suffer starvation by adding timers for the last cmnd sitting in the queue
and being rejected by the low-level driver. This would make returning of 1
from queuecommand() safe.

Regards,
-- 
Kurt Garloff  <[EMAIL PROTECTED]>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE GmbH, Nuernberg, FRG                               SCSI, Security
PGP signature
Re: wait after scsi_reset

Reply via email to