Re: [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd

2013-05-04 Thread Bart Van Assche

On 05/03/13 20:23, James Bottomley wrote:

+   const unsigned long stall_for = min(msecs_to_jiffies(10), 1UL);


Hello James,

Can you please clarify what the intention of this statement is ? Is the 
purpose of this statement to avoid that stall_for would be zero in case 
HZ  100 ? If that is the case, maybe you meant max() instead of min() ? 
Also, are you aware that msecs_to_jiffies() already rounds up the result 
of the division ?


Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd

2013-05-04 Thread James Bottomley
On Sat, 2013-05-04 at 19:02 +0200, Bart Van Assche wrote:
 On 05/03/13 20:23, James Bottomley wrote:
  +   const unsigned long stall_for = min(msecs_to_jiffies(10), 1UL);
 
 Hello James,
 
 Can you please clarify what the intention of this statement is ? Is the 
 purpose of this statement to avoid that stall_for would be zero in case 
 HZ  100 ? If that is the case, maybe you meant max() instead of min() ? 
 Also, are you aware that msecs_to_jiffies() already rounds up the result 
 of the division ?

Yes, I thought afterwards I should dump the bogus min statement as well.
Plus HZ/10 is actually 100ms, so the value is 10x wrong.  I've fixed it
up below (plus a bit of comment rework and some style fixes).

Thanks,

James

---

From 4bd9ef9789ad86656d8e52e8fff5422b741097e1 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke h...@suse.de
Date: Thu, 25 Apr 2013 08:10:00 +0200
Subject: [PATCH] [SCSI] Handle MLQUEUE busy response in scsi_send_eh_cmnd

scsi_send_eh_cmnd() is calling queuecommand() directly, so
it needs to check the return value here.
The only valid return codes for queuecommand() are 'busy'
states, so we need to wait for a bit to allow the LLDD
to recover.

Based on an earlier patch from Wen Xiong.

[jejb: fix confusion between msec and jiffies values and other issues]
[bvanassche: correct stall_for interval]
Cc: Wen Xiong wenxi...@linux.vnet.ibm.com
Cc: Brian King brk...@linux.vnet.ibm.com
Signed-off-by: Hannes Reinecke h...@suse.de
Signed-off-by: James Bottomley jbottom...@parallels.com

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..f43de1e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -25,6 +25,7 @@
 #include linux/interrupt.h
 #include linux/blkdev.h
 #include linux/delay.h
+#include linux/jiffies.h
 
 #include scsi/scsi.h
 #include scsi/scsi_cmnd.h
@@ -791,32 +792,48 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
struct scsi_device *sdev = scmd-device;
struct Scsi_Host *shost = sdev-host;
DECLARE_COMPLETION_ONSTACK(done);
-   unsigned long timeleft;
+   unsigned long timeleft = timeout;
struct scsi_eh_save ses;
+   const unsigned long stall_for = msecs_to_jiffies(100);
int rtn;
 
+retry:
scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes);
shost-eh_action = done;
 
scsi_log_send(scmd);
scmd-scsi_done = scsi_eh_done;
-   shost-hostt-queuecommand(shost, scmd);
-
-   timeleft = wait_for_completion_timeout(done, timeout);
+   rtn = shost-hostt-queuecommand(shost, scmd);
+   if (rtn) {
+   if (timeleft  stall_for) {
+   scsi_eh_restore_cmnd(scmd, ses);
+   timeleft -= stall_for;
+   msleep(jiffies_to_msecs(stall_for));
+   goto retry;
+   }
+   /* signal not to enter either branch of the if () below */
+   timeleft = 0;
+   rtn = NEEDS_RETRY;
+   } else {
+   timeleft = wait_for_completion_timeout(done, timeout);
+   }
 
shost-eh_action = NULL;
 
-   scsi_log_completion(scmd, SUCCESS);
+   scsi_log_completion(scmd, rtn);
 
SCSI_LOG_ERROR_RECOVERY(3,
printk(%s: scmd: %p, timeleft: %ld\n,
__func__, scmd, timeleft));
 
/*
-* If there is time left scsi_eh_done got called, and we will
-* examine the actual status codes to see whether the command
-* actually did complete normally, else tell the host to forget
-* about this command.
+* If there is time left scsi_eh_done got called, and we will examine
+* the actual status codes to see whether the command actually did
+* complete normally, else if we have a zero return and no time left,
+* the command must still be pending, so abort it and return FAILED.
+* If we never actually managed to issue the command, because
+* -queuecommand() kept returning non zero, use the rtn = FAILED
+* value above (so don't execute either branch of the if)
 */
if (timeleft) {
rtn = scsi_eh_completed_normally(scmd);
@@ -837,7 +854,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
rtn = FAILED;
break;
}
-   } else {
+   } else if (!rtn) {
scsi_abort_eh_cmnd(scmd);
rtn = FAILED;
}

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd

2013-05-03 Thread wenxiong


Quoting Hannes Reinecke h...@suse.de:


scsi_send_eh_cmnd() is calling queuecommand() directly, so
it needs to check the return value here.
The only valid return codes for queuecommand() are 'busy'
states, so we need to wait for a bit to allow the LLDD
to recover.

Based on an earlier patch from Wen Xiong.

Cc: Wen Xiong wenxi...@linux.vnet.ibm.com
Cc: Brian King brk...@linux.vnet.ibm.com
Signed-off-by: Hannes Reinecke h...@suse.de


Hi James,

I have verified this patch with two new ipr adapters. EEH error can be  
recoery successfully.

Do you have any question about this new patch?

Thanks,
Wendy

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d58db32..6a3c1d2 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -889,22 +889,32 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd  
*scmd, unsigned char *cmnd,

struct scsi_device *sdev = scmd-device;
struct Scsi_Host *shost = sdev-host;
DECLARE_COMPLETION_ONSTACK(done);
-   unsigned long timeleft;
+   unsigned long timeleft = timeout;
struct scsi_eh_save ses;
+   const int stall_for = min(HZ/10, 1);
int rtn;

+retry:
scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes);
shost-eh_action = done;

scsi_log_send(scmd);
scmd-scsi_done = scsi_eh_done;
-   shost-hostt-queuecommand(shost, scmd);
-
-   timeleft = wait_for_completion_timeout(done, timeout);
+   rtn = shost-hostt-queuecommand(shost, scmd);
+   if (rtn) {
+   if (timeleft) {
+   scsi_eh_restore_cmnd(scmd, ses);
+   timeleft -= stall_for;
+   msleep(stall_for);
+   goto retry;
+   }
+   rtn = NEEDS_RETRY;
+   } else
+   timeleft = wait_for_completion_timeout(done, timeout);

shost-eh_action = NULL;

-   scsi_log_completion(scmd, SUCCESS);
+   scsi_log_completion(scmd, rtn);

SCSI_LOG_ERROR_RECOVERY(3,
printk(%s: scmd: %p, timeleft: %ld\n,
@@ -935,7 +945,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd  
*scmd, unsigned char *cmnd,

rtn = FAILED;
break;
}
-   } else {
+   } else if (!rtn) {
scsi_abort_eh_cmnd(scmd);
rtn = FAILED;
}




--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd

2013-05-03 Thread James Bottomley
On Fri, 2013-05-03 at 10:24 -0400, wenxi...@linux.vnet.ibm.com wrote:
 Quoting Hannes Reinecke h...@suse.de:
 
  scsi_send_eh_cmnd() is calling queuecommand() directly, so
  it needs to check the return value here.
  The only valid return codes for queuecommand() are 'busy'
  states, so we need to wait for a bit to allow the LLDD
  to recover.
 
  Based on an earlier patch from Wen Xiong.
 
  Cc: Wen Xiong wenxi...@linux.vnet.ibm.com
  Cc: Brian King brk...@linux.vnet.ibm.com
  Signed-off-by: Hannes Reinecke h...@suse.de
 
 Hi James,
 
 I have verified this patch with two new ipr adapters. EEH error can be  
 recoery successfully.
 Do you have any question about this new patch?

Yes, it's not correct: stall_for is in jiffies not msec, so
msleep(stall_for) is taking far too long.  Plus you don't know that
stall_for is a divisor of timeleft, so timeleft -= stall_for could wrap
and cause the retry loop to go on effectively forever.

I think the below is the correct patch, so I'll commit it unless there
are objections.

James

---
From a74498dda7acf6f5fb99e43df54f2c1f5c6beec9 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke h...@suse.de
Date: Thu, 25 Apr 2013 08:10:00 +0200
Subject: [PATCH] [SCSI] Handle MLQUEUE busy response in scsi_send_eh_cmnd

scsi_send_eh_cmnd() is calling queuecommand() directly, so
it needs to check the return value here.
The only valid return codes for queuecommand() are 'busy'
states, so we need to wait for a bit to allow the LLDD
to recover.

Based on an earlier patch from Wen Xiong.

[jejb: fix confusion between msec and jiffies values and other issues]
Cc: Wen Xiong wenxi...@linux.vnet.ibm.com
Cc: Brian King brk...@linux.vnet.ibm.com
Signed-off-by: Hannes Reinecke h...@suse.de
Signed-off-by: James Bottomley jbottom...@parallels.com

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..cfd1ef2 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -25,6 +25,7 @@
 #include linux/interrupt.h
 #include linux/blkdev.h
 #include linux/delay.h
+#include linux/jiffies.h
 
 #include scsi/scsi.h
 #include scsi/scsi_cmnd.h
@@ -791,22 +792,33 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
struct scsi_device *sdev = scmd-device;
struct Scsi_Host *shost = sdev-host;
DECLARE_COMPLETION_ONSTACK(done);
-   unsigned long timeleft;
+   unsigned long timeleft = timeout;
struct scsi_eh_save ses;
+   const unsigned long stall_for = min(msecs_to_jiffies(10), 1UL);
int rtn;
 
+retry:
scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes);
shost-eh_action = done;
 
scsi_log_send(scmd);
scmd-scsi_done = scsi_eh_done;
-   shost-hostt-queuecommand(shost, scmd);
-
-   timeleft = wait_for_completion_timeout(done, timeout);
+   rtn = shost-hostt-queuecommand(shost, scmd);
+   if (rtn) {
+   if (timeleft  stall_for) {
+   scsi_eh_restore_cmnd(scmd, ses);
+   timeleft -= stall_for;
+   msleep(jiffies_to_msecs(stall_for));
+   goto retry;
+   }
+   timeleft = 0;
+   rtn = NEEDS_RETRY;
+   } else
+   timeleft = wait_for_completion_timeout(done, timeout);
 
shost-eh_action = NULL;
 
-   scsi_log_completion(scmd, SUCCESS);
+   scsi_log_completion(scmd, rtn);
 
SCSI_LOG_ERROR_RECOVERY(3,
printk(%s: scmd: %p, timeleft: %ld\n,
@@ -837,7 +849,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
rtn = FAILED;
break;
}
-   } else {
+   } else if (!rtn) {
scsi_abort_eh_cmnd(scmd);
rtn = FAILED;
}

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi: Handle MLQUEUE busy response in scsi_send_eh_cmnd

2013-04-25 Thread Hannes Reinecke
scsi_send_eh_cmnd() is calling queuecommand() directly, so
it needs to check the return value here.
The only valid return codes for queuecommand() are 'busy'
states, so we need to wait for a bit to allow the LLDD
to recover.

Based on an earlier patch from Wen Xiong.

Cc: Wen Xiong wenxi...@linux.vnet.ibm.com
Cc: Brian King brk...@linux.vnet.ibm.com
Signed-off-by: Hannes Reinecke h...@suse.de

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d58db32..6a3c1d2 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -889,22 +889,32 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
struct scsi_device *sdev = scmd-device;
struct Scsi_Host *shost = sdev-host;
DECLARE_COMPLETION_ONSTACK(done);
-   unsigned long timeleft;
+   unsigned long timeleft = timeout;
struct scsi_eh_save ses;
+   const int stall_for = min(HZ/10, 1);
int rtn;
 
+retry:
scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes);
shost-eh_action = done;
 
scsi_log_send(scmd);
scmd-scsi_done = scsi_eh_done;
-   shost-hostt-queuecommand(shost, scmd);
-
-   timeleft = wait_for_completion_timeout(done, timeout);
+   rtn = shost-hostt-queuecommand(shost, scmd);
+   if (rtn) {
+   if (timeleft) {
+   scsi_eh_restore_cmnd(scmd, ses);
+   timeleft -= stall_for;
+   msleep(stall_for);
+   goto retry;
+   }
+   rtn = NEEDS_RETRY;
+   } else
+   timeleft = wait_for_completion_timeout(done, timeout);
 
shost-eh_action = NULL;
 
-   scsi_log_completion(scmd, SUCCESS);
+   scsi_log_completion(scmd, rtn);
 
SCSI_LOG_ERROR_RECOVERY(3,
printk(%s: scmd: %p, timeleft: %ld\n,
@@ -935,7 +945,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
rtn = FAILED;
break;
}
-   } else {
+   } else if (!rtn) {
scsi_abort_eh_cmnd(scmd);
rtn = FAILED;
}
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html