Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

Bart Van Assche Sat, 15 Jun 2013 02:53:19 -0700

On 06/14/13 19:59, Vu Pham wrote:

On 06/13/13 21:43, Vu Pham wrote:

+/**
+ * srp_tmo_valid() - check timeout combination validity
+ *
+ * If no fast I/O fail timeout has been configured then the device
loss timeout
+ * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail
timeout has
+ * been configured then it must be below the device loss timeout.
+ */
+int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
+{
+    return (fast_io_fail_tmo < 0 && 1 <= dev_loss_tmo &&
+        dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+        || (0 <= fast_io_fail_tmo &&
+            (dev_loss_tmo < 0 ||
+             (fast_io_fail_tmo < dev_loss_tmo &&
+              dev_loss_tmo < LONG_MAX / HZ))) ? 0 : -EINVAL;
+}
+EXPORT_SYMBOL_GPL(srp_tmo_valid);

fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with
negative value
dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with
negative value


OK, will update the documentation such that it correctly refers to
"off" instead of a negative value and I will also mention that
dev_loss_tmo can now be disabled.

It's not only the documentation but also the code logic, you cannot turn
dev_loss_tmo off if fast_io_fail_tmo already turned off and vice versa
with the return statement above.

Does this mean that you think it would be useful to disable both thefast_io_fail and the dev_loss mechanisms, and hence rely on the user toremove remote ports that have disappeared and on the SCSI commandtimeout to detect path failures ? I'll start testing this to see whetherthat combination does not trigger any adverse behavior.

If rport's state is already SRP_RPORT_BLOCKED, I don't think we need
to do extra block with scsi_block_requests()


Please keep in mind that srp_reconnect_rport() can be called from two
different contexts: that function can not only be called from inside
the SRP transport layer but also from inside the SCSI error handler
(see also the srp_reset_device() modifications in a later patch in
this series). If this function is invoked from the context of the SCSI
error handler the chance is high that the SCSI device will have
another state than SDEV_BLOCK. Hence the scsi_block_requests() call in
this function.

Yes, srp_reconnect_rport() can be called from two contexts; however, it
deals with same rport & rport's state.
I'm thinking something like this:

    if (rport->state != SRP_RPORT_BLOCKED) {
     scsi_block_requests(shost);

Sorry but I'm afraid that that approach would still allow the user tounblock one or more SCSI devices via sysfs during thei->f->reconnect(rport) call, something we do not want.

I think that we can use only the pair
scsi_block_requests()/scsi_unblock_requests() unless the advantage of
multipathd recognizing the SDEV_BLOCK is noticeable.

I think the advantage of multipathd recognizing the SDEV_BLOCK statebefore the fast_io_fail_tmo timer has expired is important. Multipathddoes not queue I/O to paths that are in the SDEV_BLOCK state so settingthat state helps I/O to fail over more quickly, especially for largevalues of fast_io_fail_tmo.


Hope this helps,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

Reply via email to