Patrick Mansfield wrote:
On Sat, Jan 29, 2005 at 10:44:41AM -0600, James Bottomley wrote:

On Fri, 2005-01-28 at 21:46 -0800, Andrew Vasquez wrote:

Returning back DID_IMM_RETRY for these 'transport' related conditions
would of course help in this issue -- but at the same time bring with it
several side-effects which may not be desirable.

So, beyond this particular circumstance, what would be considered a
'proper' return status for this type of event?

Well, the correct return, since this is a condition from the storage, is simply the check condition and the sense code (rather than having the driver interpret it).


But the transport hit a failure, not the storage device.

I thought Andrew hit this sequence:

        - pull / replace cable

        - IO resumes but gets NOT_READY (the device could be logging back
          into the fibre or such)

        - a FC transport problem is hit, DID_BUSY_BUSY is returned, but
          scmd->retries has already been exhausted by the NOT_READY

Did I misread something?

Patrick, I was also thinking of commenting on this. It depends on where the failure is: a) between the device server (target) and a logical unit (lu) b) in the service delivery subsystem between the initiator (port) and the target (port).

James's explanation covers case a) (i.e. the device server
should constuct appropriate sense data and a SCSI status
in response to the current and future SCSI commands.
In case b) the reponse is transport dependent.
For example, in the case of SAS there are two further
situations:
   1) the failure occurs on a direct connect between the
      initiator (port) and the target (port) [e.g. between
      a HBA port and a target port on a disk].
      Then a low level state machine (phy/link layer) on
      the HBA will notice the problem
   2) the failure occurs between an expander and an end
      device (e.g. a tape drive). Then the expander issues
      a BROADCAST(CHANGE) link layer primitive which the
      initiator(s) will receive. In reponse to this the
      initiator(s) should do another discovery process
      to find the new topology (via SMP).

Also both of these situations are detected in real time
(more or less), not when the next command is issued.
New SCSI commands will fail relatively quickly when
the SAS HBA fails to open a connection to the target.
SCSI commands "in flight" to an effected target should
trigger connection timeouts in the initiator.

Doug Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to