On 4/4/18 16:39, Damien Le Moal wrote:
> Hannes,
> 
> On 4/4/18 16:35, Hannes Reinecke wrote:
>> On Wed, 4 Apr 2018 07:06:58 +0000
>> Damien Le Moal <damien.lem...@wdc.com> wrote:
>>
>>> Hannes,
>>>
>>> On 4/4/18 15:57, Hannes Reinecke wrote:
>>>> On Wed,  4 Apr 2018 15:51:38 +0900
>>>> Damien Le Moal <damien.lem...@wdc.com> wrote:
>>>>   
>>>>> With the introduction of commit e39a97353e53 ("scsi: core: return
>>>>> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command
>>>>> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but
>>>>> lacking additional sense information will have a return code set to
>>>>> BLK_STS_OK. This results in the request issuer to see successful
>>>>> request execution despite the failure. An example of such case is
>>>>> an unaligned write on a host managed ZAC disk connected to a SAS
>>>>> HBA with a malfunctioning SAT. The unaligned write command gets
>>>>> aborted but has no additional sense information.
>>>>>
>>>>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
>>>>> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key :
>>>>> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense:
>>>>> No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB:
>>>>> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
>>>>> print_req_error: I/O error, dev sde, sector 274726920
>>>>>
>>>>> In scsi_io_completion(), sense key handling to not change the
>>>>> request error code and success being reported to the issuer.
>>>>>
>>>>> Fix this by making sure that the error code always indicates an
>>>>> error if scsi_io_completion() decide that the action to be taken
>>>>> for a failed command is to not retry it and terminate it
>>>>> immediately (ACTION_FAIL) .
>>>>>
>>>>> Signed-off-by: Damien Le Moal <damien.lem...@wdc.com>
>>>>> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in
>>>>> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <h...@suse.com>
>>>>> Cc: <sta...@vger.kernel.org>
>>>>> ---
>>>>>  drivers/scsi/scsi_lib.c | 9 +++++++++
>>>>>  1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>> index c84f931388f2..87579bfcc186 100644
>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd
>>>>> *cmd, unsigned int good_bytes) scsi_print_command(cmd);
>>>>>                   }
>>>>>           }
>>>>> +         /*
>>>>> +          * The command failed and should not be retried.
>>>>> If the host
>>>>> +          * byte is DID_OK, then
>>>>> __scsi_error_from_host_byte() returned
>>>>> +          * BLK_STS_OK and error indicates a success. Make
>>>>> sure to not
>>>>> +          * use that as the completion code and always
>>>>> return an
>>>>> +          * I/O error.
>>>>> +          */
>>>>> +         if (error == BLK_STS_OK)
>>>>> +                 error = BLK_STS_IOERR;
>>>>>           if (!scsi_end_request(req, error,
>>>>> blk_rq_err_bytes(req), 0)) return;
>>>>>           /*FALLTHRU*/  
>>>>
>>>> That looks wrong.
>>>> Shouldn't __scsi_error_from_host_byte() return the correct status
>>>> here?  
>>>
>>> My drive said:
>>>
>>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
>>> driverbyte=DRIVER_SENSE
>>> sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current]
>>> sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense
>>> information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00
>>> 00 02 0c 00 01 00 00 00 01 00 00
>>>
>>> Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns
>>> BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND
>>> case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing
>>> and error stays equal to success. scsi_end_request() gets called with
>>> that and dd sees a success...
>>>
>>> There are also plenty of other sense keys cases where error is not
>>> changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I
>>> think this is likely the most common case since an command failure
>>> with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most
>>> common one.
>>>
>>> My patch is a bit of a hammer and makes sure that an ACTION_FAIL
>>> request is completed as a failure... Am I getting all this wrong ?
>>>
>>
>> Just checked with the code, and send a new patch.
>> Which I find a bit clearer.
>>
>> Fact is that __scsi_error_from_host_byte() is not sufficient to
>> evaluate the return code, so taking its return value with any further
>> checks is indeed wrong.
>> But then it never said so on the boilerplate of
>> __scsi_error_from_host_byte() ...
>>
>> So please do check the 'v2' version of the patch.
>>
> 
> It fixes the particular problem I am seeing.
> But looking more at this code, isn't there a problem with
> sshdr.sense_key == RECOVERED_ERROR ?
> 
> if (sense_valid && (sshdr.sense_key == RECOVERED_ERROR)) {
>       ...
>       result = 0;
>         /* for passthrough error may be set */
>         error = BLK_STS_OK;
> }
> 
> Then the error will be changed wrongly to BLK_STS_IOERR.
> 
> My original change had the same problem.

Ignore. It does not go through the change in that case thanks to:
        if (!(blk_rq_bytes(req) == 0 && error) &&
            !scsi_end_request(req, error, good_bytes, 0))

                return;
Assuming the RECOVERED_ERROR has no left over bytes. Otherwise, it goes
through requeue. Both cases being before the error code overwrite, no
problem.

Thanks !

-- 
Damien Le Moal,
Western Digital

Reply via email to