srp: Avoid endless SCSI error handling loop

David Dillow Fri, 14 Dec 2012 07:55:55 -0800

On Fri, 2012-12-14 at 16:38 +0100, Bart Van Assche wrote:
> If a SCSI command times out it is passed to the SCSI error
> handler. The SCSI error handler will try to abort the command
> that timed out. If aborting failed a device reset will be
> attempted. If the device reset fails too a host reset will
> be attempted. If the host reset also fails the whole procedure
> will be repeated.
> 
> Since srp_abort() and srp_reset_device() fail for a QP in the
> error state and since srp_reset_host() fails after host removal
> has started an endless loop will be triggered.
> 
> Hence modify the SCSI error handling functions in ib_srp as
> follows:
> - Abort SCSI commands properly even if the QP is in the error
>   state.
> - Make srp_reset_host() reset SCSI requests even if host
>   removal has already started or if reconnecting fails.


This is much more than your original patch that Alex claimed fixed his
issues; are you not merging two separate issues?

Also, there's no reason to invoke srp_send_tsk_mgmt() if we're not
connected or the QP is in error -- for those cases, it makes sense to
just abort the command directly. Similarly, we should probably be
checking the status of srp_send_tsk_mgmt() and failing -- or checking
qp_in_error/connected again and directly aborting if we have problems.

No?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] IB/srp: Avoid endless SCSI error handling loop

Reply via email to