Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime

Ewan Milne Thu, 17 Oct 2013 07:29:33 -0700

On Wed, 2013-10-16 at 19:22 +0000, James Bottomley wrote:
> What about instead:
> 
> static int scsi_host_eh_past_deadline(struct Scsi_Host *shost, int percent) {
>       if (!shost->last_reset || !shost->eh_deadline)
>               return 0;
> 
>       if (time_before(jiffies,
>                       shost->last_reset + shost->eh_deadline * percent/100))
>               return 0;
> 
>       return 1;
> }
> 
> which allows us to have
> 
> if (scsi_host_eh_past_deadline(shost, 50)) {
> 
> in scsi_eh_abort_cmds()
> 
> if (scsi_host_eh_past_deadline(shost, 66) {
> 
> in scsi_eh_bus_device_reset()
> 
> say 83 in target reset, and 100 in bus reset.
> 
> Thus ensuring we at least get a crack at the reset chain?
> 
> James
>


Well, conceptually that seems like a good idea, since if there
is limited time available it is probably wiser to spend it on
higher-level recovery instead of timing out trying to deal with
individual devices, but we didn't do any testing on how long
the bus device reset/target reset/bus reset take.  The host
reset was about 10 seconds for lpfc, and the maximum time was
(command timeout) + (eh deadline) + (host reset time).

However...

With this enhancement, the maximum time could be much longer if
we attempt to e.g. perform a bus reset right before the
eh_deadline expires, because drivers like lpfc iterate over the
targets and send a target reset to each one (with a timeout).

The original problem that prompted this change was that a
target became inaccessible, and nothing the EH did was ever going
to do anything except timeout, until the host reset was performed,
at which point the FC login would fail and the HBA would start
failing commands immediately instead of them timing out.

I guess the main thing is that there should be some way to
explain to people what value to use for eh_deadline in order
for I/O to complete within a specified amount of time (e.g. before
some other node in a cluster shoots us because we are unresponsive).

-Ewan




--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime

Reply via email to