On 04/16/2013 10:26 PM, [email protected] wrote:
> We discussed James's concern. We intergated James's patch and generated
> this updated patch.
>
> Fix scsi_send_eh_cmnd to check the return code of queuecommand when
> sending commands and retry for a bit if the LLDD returns a busy response.
> This fixes an issue seen with the ipr driver where an ipr initiated reset
> immediately following an eh_host_reset caused EH initiated commands to fail,
> resulting in devices being taken offline. This patch resolves the issue.
>
>
> Signed-off-by: Wen Xiong <[email protected]>
> Signed-off-by: Brian King <[email protected]>
> ---
> drivers/scsi/scsi_error.c | 34 +++++++++++++++++++++++++---------
> 1 file changed, 25 insertions(+), 9 deletions(-)
>
> Index: b/drivers/scsi/scsi_error.c
> ===================================================================
> --- a/drivers/scsi/scsi_error.c 2013-04-16 12:56:16.617857960 -0500
> +++ b/drivers/scsi/scsi_error.c 2013-04-16 12:57:00.279108838 -0500
> @@ -791,18 +791,21 @@ static int scsi_send_eh_cmnd(struct scsi
> struct scsi_device *sdev = scmd->device;
> struct Scsi_Host *shost = sdev->host;
> DECLARE_COMPLETION_ONSTACK(done);
> - unsigned long timeleft;
> + unsigned long timeleft = timeout;
> struct scsi_eh_save ses;
> + const int stall_for = min(HZ/10, 1);
> int rtn;
>
> scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
> +retry:
> shost->eh_action = &done;
>
> scsi_log_send(scmd);
> scmd->scsi_done = scsi_eh_done;
> - shost->hostt->queuecommand(shost, scmd);
> + rtn = shost->hostt->queuecommand(shost, scmd);
>
> - timeleft = wait_for_completion_timeout(&done, timeout);
> + if (!rtn)
> + timeleft = wait_for_completion_timeout(&done, timeout);
>
> shost->eh_action = NULL;
>
Hmm. This seems to be a generic bug fix here; it is perfectly
ok for queuecommand() to return a non-zero value without calling
->scsi_done. At which point ->complete is pointless to try as no
completion ever would be invoked.
Mind separating that out as a separate patch?
> @@ -819,10 +822,19 @@ static int scsi_send_eh_cmnd(struct scsi
> * about this command.
> */
> if (timeleft) {
> - rtn = scsi_eh_completed_normally(scmd);
> - SCSI_LOG_ERROR_RECOVERY(3,
> - printk("%s: scsi_eh_completed_normally %x\n",
> - __func__, rtn));
> + switch (rtn) {
> + case 0:
> + rtn = scsi_eh_completed_normally(scmd);
> + SCSI_LOG_ERROR_RECOVERY(3,
> + printk("%s: scsi_eh_completed_normally %x\n",
> + __func__, rtn));
> + break;
> + case FAILED:
> + break;
> + default:
> + rtn = ADD_TO_MLQUEUE;
> + break;
> + }
>
> switch (rtn) {
> case SUCCESS:
Bzzt.
'FAILED' is a valid response for scsi_eh_completed_normally, but not
for ->queuecommand.
> @@ -831,8 +843,12 @@ static int scsi_send_eh_cmnd(struct scsi
> case TARGET_ERROR:
> break;
> case ADD_TO_MLQUEUE:
> - rtn = NEEDS_RETRY;
> - break;
> + if (timeleft > stall_for) {
> + timeout = timeleft - stall_for;
> + msleep(stall_for);
> + goto retry;
> + }
> + /* fall through */
> default:
> rtn = FAILED;
> break;
>
We're already calling 'wait_for_completion_timeout'.
So normally the 'msleep' wouldn't be necessary.
It's only required for the case when ->queuecommand
returns non-zero.
So I'd rather see to have the msleep into section
where the return command from queuecommand is evaluated;
here we'll actually decrease responsiveness as
we're always waiting for X seconds, even if the command
would've been completed during that time.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html