On 04/16/2013 10:26 PM, [email protected] wrote:
> We discussed James's concern. We intergated James's patch and generated
> this updated patch.
> 
> Fix scsi_send_eh_cmnd to check the return code of queuecommand when
> sending commands and retry for a bit if the LLDD returns a busy response.
> This fixes an issue seen with the ipr driver where an ipr initiated reset
> immediately following an eh_host_reset caused EH initiated commands to fail,
> resulting in devices being taken offline. This patch resolves the issue.
> 
> 
> Signed-off-by: Wen Xiong <[email protected]>
> Signed-off-by: Brian King <[email protected]>
> ---
>  drivers/scsi/scsi_error.c |   34 +++++++++++++++++++++++++---------
>  1 file changed, 25 insertions(+), 9 deletions(-)
> 
> Index: b/drivers/scsi/scsi_error.c
> ===================================================================
> --- a/drivers/scsi/scsi_error.c       2013-04-16 12:56:16.617857960 -0500
> +++ b/drivers/scsi/scsi_error.c       2013-04-16 12:57:00.279108838 -0500
> @@ -791,18 +791,21 @@ static int scsi_send_eh_cmnd(struct scsi
>       struct scsi_device *sdev = scmd->device;
>       struct Scsi_Host *shost = sdev->host;
>       DECLARE_COMPLETION_ONSTACK(done);
> -     unsigned long timeleft;
> +     unsigned long timeleft = timeout;
>       struct scsi_eh_save ses;
> +     const int stall_for = min(HZ/10, 1);
>       int rtn;
>  
>       scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
> +retry:
>       shost->eh_action = &done;
>  
>       scsi_log_send(scmd);
>       scmd->scsi_done = scsi_eh_done;
> -     shost->hostt->queuecommand(shost, scmd);
> +     rtn = shost->hostt->queuecommand(shost, scmd);
>  
> -     timeleft = wait_for_completion_timeout(&done, timeout);
> +     if (!rtn)
> +             timeleft = wait_for_completion_timeout(&done, timeout);
>  
>       shost->eh_action = NULL;
>  
Hmm. This seems to be a generic bug fix here; it is perfectly
ok for queuecommand() to return a non-zero value without calling
->scsi_done. At which point ->complete is pointless to try as no
completion ever would be invoked.

Mind separating that out as a separate patch?

> @@ -819,10 +822,19 @@ static int scsi_send_eh_cmnd(struct scsi
>        * about this command.
>        */
>       if (timeleft) {
> -             rtn = scsi_eh_completed_normally(scmd);
> -             SCSI_LOG_ERROR_RECOVERY(3,
> -                     printk("%s: scsi_eh_completed_normally %x\n",
> -                            __func__, rtn));
> +             switch (rtn) {
> +             case 0:
> +                     rtn = scsi_eh_completed_normally(scmd);
> +                     SCSI_LOG_ERROR_RECOVERY(3,
> +                             printk("%s: scsi_eh_completed_normally %x\n",
> +                                    __func__, rtn));
> +                     break;
> +             case FAILED:
> +                     break;
> +             default:
> +                     rtn = ADD_TO_MLQUEUE;
> +                     break;
> +             }
>  
>               switch (rtn) {
>               case SUCCESS:
Bzzt.
'FAILED' is a valid response for scsi_eh_completed_normally, but not
for ->queuecommand.

> @@ -831,8 +843,12 @@ static int scsi_send_eh_cmnd(struct scsi
>               case TARGET_ERROR:
>                       break;
>               case ADD_TO_MLQUEUE:
> -                     rtn = NEEDS_RETRY;
> -                     break;
> +                     if (timeleft > stall_for) {
> +                             timeout = timeleft - stall_for;
> +                             msleep(stall_for);
> +                             goto retry;
> +                     }
> +                     /* fall through */
>               default:
>                       rtn = FAILED;
>                       break;
> 
We're already calling 'wait_for_completion_timeout'.
So normally the 'msleep' wouldn't be necessary.
It's only required for the case when ->queuecommand
returns non-zero.
So I'd rather see to have the msleep into section
where the return command from queuecommand is evaluated;
here we'll actually decrease responsiveness as
we're always waiting for X seconds, even if the command
would've been completed during that time.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
[email protected]                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to