Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline
On 02/18/13 09:11, Sagi Grimberg wrote: On 2/18/2013 6:06 AM, David Dillow wrote: On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -734,6 +734,7 @@ static int srp_reconnect_target(struct srp_target_port *target) scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); +target-transport_offline = ret != 0; Minor nit, that line is hard to read; I keep thinking it needs parens around the conditional... Perhaps target-transport_offline = !!ret; or target-transport_offline = ret; gcc should do the right conversion since we're assigning to a bool. Or, Vu, does this solve the issue you've seen? I may have time to test later this week, but not before. This indeed solve scsi_host removal issues. Vu is on vacation, I'll perform some more failover tests... Hello Sagi, Since no further feedback was posted on the list I assume that means that all tests passed ? Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline
On 2/24/2013 10:09 AM, Bart Van Assche wrote: On 02/18/13 09:11, Sagi Grimberg wrote: On 2/18/2013 6:06 AM, David Dillow wrote: On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -734,6 +734,7 @@ static int srp_reconnect_target(struct srp_target_port *target) scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); +target-transport_offline = ret != 0; Minor nit, that line is hard to read; I keep thinking it needs parens around the conditional... Perhaps target-transport_offline = !!ret; or target-transport_offline = ret; gcc should do the right conversion since we're assigning to a bool. Or, Vu, does this solve the issue you've seen? I may have time to test later this week, but not before. This indeed solve scsi_host removal issues. Vu is on vacation, I'll perform some more failover tests... Hello Sagi, Since no further feedback was posted on the list I assume that means that all tests passed ? Bart. Hey Bart, Sorry for the delay I was just about to reply... From my end, the related patchset seems solve the scsi_host removal issue and prevents the SCSI error handling loop. Generally our tests passed, I still have some issue with long-term failover test but I'm not sure its SRP (perhaps might origin in IB layer). So ack from me... -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline
On Sun, Feb 24, 2013 at 10:59 AM, Sagi Grimberg sa...@mellanox.com wrote: On 2/24/2013 10:09 AM, Bart Van Assche wrote: On 02/18/13 09:11, Sagi Grimberg wrote: On 2/18/2013 6:06 AM, David Dillow wrote: Or, Vu, does this solve the issue you've seen? I may have time to test later this week, but not before. This indeed solve scsi_host removal issues. Vu is on vacation, I'll perform some more failover tests... Hello Sagi, Since no further feedback was posted on the list I assume that means that all tests passed ? Hey Bart, Sorry for the delay I was just about to reply... From my end, the related patchset seems solve the scsi_host removal issue and prevents the SCSI error handling loop. Generally our tests passed, I still have some issue with long-term failover test but I'm not sure its SRP (perhaps might origin in IB layer). So ack from me... OK Dave/Roland - sounds like we did managed to get real progress here... lets get these fix in for 3.9 and 3.8-stable too, Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline
On 02/18/13 05:06, David Dillow wrote: On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -734,6 +734,7 @@ static int srp_reconnect_target(struct srp_target_port *target) scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); + target-transport_offline = ret != 0; Minor nit, that line is hard to read; I keep thinking it needs parens around the conditional... Perhaps target-transport_offline = !!ret; or target-transport_offline = ret; gcc should do the right conversion since we're assigning to a bool. Personally I prefer changing the assignment into the first alternative since it's more explicit than the second alternative - it doesn't require the person who's reading the source code that the transport_offline variable has been declared as bool. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline
On 2/18/2013 6:06 AM, David Dillow wrote: On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -734,6 +734,7 @@ static int srp_reconnect_target(struct srp_target_port *target) scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); + target-transport_offline = ret != 0; Minor nit, that line is hard to read; I keep thinking it needs parens around the conditional... Perhaps target-transport_offline = !!ret; or target-transport_offline = ret; gcc should do the right conversion since we're assigning to a bool. Or, Vu, does this solve the issue you've seen? I may have time to test later this week, but not before. Hey David, This indeed solve scsi_host removal issues. Vu is on vacation, I'll perform some more failover tests... -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline
On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -734,6 +734,7 @@ static int srp_reconnect_target(struct srp_target_port *target) scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); + target-transport_offline = ret != 0; Minor nit, that line is hard to read; I keep thinking it needs parens around the conditional... Perhaps target-transport_offline = !!ret; or target-transport_offline = ret; gcc should do the right conversion since we're assigning to a bool. Or, Vu, does this solve the issue you've seen? I may have time to test later this week, but not before. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/srp: Fail I/O requests if the transport is offline
If an SRP target is no longer reachable and srp_reset_host() fails to reconnect then ib_srp will invoke scsi_remove_host(). That function will invoke __scsi_remove_device() for each LUN. And that last function will change the device state from SDEV_TRANSPORT_OFFLINE into SDEV_CANCEL. Certain user space software, e.g. older versions of multipathd, continue queueing I/O to SCSI devices that are in the SDEV_CANCEL state. If these I/O requests are submitted as SG_IO that means that the REQ_PREEMPT flag will be set and hence that these requests will be passed to srp_queuecommand(). These requests will time out. If new requests are queued fast enough from user space these active requests will prevent __scsi_remove_device() to finish. Avoid this by failing I/O requests in the SDEV_CANCEL state if the transport is offline. Introduce a new variable to keep track of the transport state instead of failing requests if (!target-connected || target-qp_in_error) such that the SCSI error handler has a chance to retry commands after a transport layer failure occurred. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: David Dillow d...@thedillows.org Cc: Or Gerlitz ogerl...@mellanox.com Cc: Vu Pham v...@mellanox.com --- drivers/infiniband/ulp/srp/ib_srp.c |7 +++ drivers/infiniband/ulp/srp/ib_srp.h |1 + 2 files changed, 8 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -734,6 +734,7 @@ static int srp_reconnect_target(struct srp_target_port *target) scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); + target-transport_offline = ret != 0; if (ret) goto err; @@ -1353,6 +1354,12 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) unsigned long flags; int len; + if (unlikely(target-transport_offline)) { + scmnd-result = DID_NO_CONNECT 16; + scmnd-scsi_done(scmnd); + return 0; + } + spin_lock_irqsave(target-lock, flags); iu = __srp_get_tx_iu(target, SRP_IU_CMD); if (!iu) diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index de2d0b3..66fbedd 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -140,6 +140,7 @@ struct srp_target_port { unsigned intcmd_sg_cnt; unsigned intindirect_size; boolallow_ext_sg; + booltransport_offline; /* Everything above this point is used in the hot path of * command processing. Try to keep them packed into cachelines. -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html