On 09.08.20 17:26, Alan Stern wrote:
> On Sun, Aug 09, 2020 at 11:20:22AM +0200, Martin Kepplinger wrote:
>> Hey Alan, I'm really glad for that, I suspected some of this but I have
>> little experience in scsi/block layers, so that is super helpful.
>>
>> I'd appreciate an opinion on the below workaround that *seems* to work
>> now (let's see, I had thought so before :)
>>
>> Whether or not this helps to find a real solution, let's see. But
>> integration of such a flag in the error handling paths is what's
>> interesting for now:
>>
>>
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -565,6 +565,17 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>>                              return NEEDS_RETRY;
>>                      }
>>              }
>> +            if (scmd->device->expecting_media_change) {
>> +                    if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
>> +                            /* clear expecting_media_change in
>> +                             * scsi_noretry_cmd() because we need
>> +                             * to override possible "failfast" overrides
>> +                             * that block readahead can cause.
>> +                             */
>> +                            return NEEDS_RETRY;
> 
> This is a somewhat fragile approach.  You don't know for certain that 
> scsi_noretry_cmd will be called.  Also, scsi_noretry_cmd can be called 
> from other places.
> 
> It would be better to clear the expecting_media_change flag just before 
> returning from scsi_decide_disposition.  That way its use is localized 
> to one routine, not spread out between two.
> 
> Alan Stern
> 

Hi Alan,

maybe you're right. I initially just thought that I'd allow for specific
error codes in scsi_noretry_cmd() to return non-NULL (BUS_BUSY, PARITY,
ERROR) despite having the flag set.

The below version works equally fine for me but I'm not sure if it's
actually more safe.

James, when exposing a new writable sysfs option like
"suspend_no_media_change"(?) that drivers can check before setting the
new "expecting_media_change" flag (during resume), would this addition
make sense to you?

thanks,

                      martin



--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -565,6 +565,18 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
                                return NEEDS_RETRY;
                        }
                }
+               if (scmd->device->expecting_media_change) {
+                       if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
+                               /*
+                                * clear the expecting_media_change in
+                                * scsi_decide_disposition() because we
+                                * need to catch possible "fail fast" overrides
+                                * that block readahead can cause.
+                                */
+                               return NEEDS_RETRY;
+                       }
+               }
+
                /*
                 * we might also expect a cc/ua if another LUN on the target
                 * reported a UA with an ASC/ASCQ of 3F 0E -
@@ -1944,9 +1956,19 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
         * the request was not marked fast fail.  Note that above,
         * even if the request is marked fast fail, we still requeue
         * for queue congestion conditions (QUEUE_FULL or BUSY) */
-       if ((++scmd->retries) <= scmd->allowed
-           && !scsi_noretry_cmd(scmd)) {
-               return NEEDS_RETRY;
+       if ((++scmd->retries) <= scmd->allowed) {
+               /*
+                * but scsi_noretry_cmd() cannot override the
+                * expecting_media_change flag.
+                */
+               if (!scsi_noretry_cmd(scmd) ||
+                   scmd->device->expecting_media_change) {
+                       scmd->device->expecting_media_change = 0;
+                       return NEEDS_RETRY;
+               } else {
+                       /* marked fast fail and not expected. */
+                       return SUCCESS;
+               }
        } else {
                /*
                 * no more retries - report this one back to upper level.
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..bb583e403b81 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev)
        if (!sdkp)      /* E.g.: runtime resume at the start of sd_probe() */
                return 0;

+       sdkp->device->expecting_media_change = 1;
+
        if (!sdkp->device->manage_start_stop)
                return 0;

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index bc5909033d13..f5fc1af68e00 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -169,6 +169,7 @@ struct scsi_device {
                                 * this device */
        unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
                                     * because we did a bus reset. */
+       unsigned expecting_media_change:1;
        unsigned use_10_for_rw:1; /* first try 10-byte read / write */
        unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
        unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
-- 

Reply via email to