On 06/19/13 15:44, Jack Wang wrote:
+               /*
+                * It can occur that after fast_io_fail_tmo expired and before
+                * dev_loss_tmo expired that the SCSI error handler has
+                * offlined one or more devices. scsi_target_unblock() doesn't
+                * change the state of these devices into running, so do that
+                * explicitly.
+                */
+               spin_lock_irq(shost->host_lock);
+               __shost_for_each_device(sdev, shost)
+                       if (sdev->sdev_state == SDEV_OFFLINE)
+                               sdev->sdev_state = SDEV_RUNNING;
+               spin_unlock_irq(shost->host_lock);

Do you have test case to verify this behaviour?

Hello Jack,

This is what I came up with after analyzing why a so-called "port flapping" test failed. The concept of that test is simple: use ibportstate to disable and reenable the proper IB port on the switch with random intervals and check whether I/O starts running again if the path remains operational long enough. When running such a test for a few days with random intervals between a few seconds and a few minutes sooner or later it will occur that scsi_try_host_reset() succeeds and that scsi_eh_test_devices() fails. That will cause the SCSI error handler to offline devices. Hence the above code to change the offline state into running after a reconnect succeeds. I'm not proud of that code but I couldn't find a better solution. Maybe the above code won't be necessary anymore once we switch to Hannes' new SCSI error handler.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to