On Fri, 2012-12-14 at 16:38 +0100, Bart Van Assche wrote: > If a SCSI command times out it is passed to the SCSI error > handler. The SCSI error handler will try to abort the command > that timed out. If aborting failed a device reset will be > attempted. If the device reset fails too a host reset will > be attempted. If the host reset also fails the whole procedure > will be repeated. > > Since srp_abort() and srp_reset_device() fail for a QP in the > error state and since srp_reset_host() fails after host removal > has started an endless loop will be triggered. > > Hence modify the SCSI error handling functions in ib_srp as > follows: > - Abort SCSI commands properly even if the QP is in the error > state. > - Make srp_reset_host() reset SCSI requests even if host > removal has already started or if reconnecting fails.
This is much more than your original patch that Alex claimed fixed his issues; are you not merging two separate issues? Also, there's no reason to invoke srp_send_tsk_mgmt() if we're not connected or the QP is in error -- for those cases, it makes sense to just abort the command directly. Similarly, we should probably be checking the status of srp_send_tsk_mgmt() and failing -- or checking qp_in_error/connected again and directly aborting if we have problems. No? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
