On Wed, 2018-11-14 at 09:26 -0700, Keith Busch wrote:
> The scsi timeout error handling had been directly updating the request
> state to prevent a natural completion and error handling from completing
> the same request twice. Fix this layering violation by having scsi
> control the fate of its commands with scsi owned flags rather than
> use blk-mq's.
> 
> Signed-off-by: Keith Busch <[email protected]>
> ---
>  drivers/scsi/scsi_error.c | 17 +++--------------
>  drivers/scsi/scsi_lib.c   |  6 +++++-
>  include/scsi/scsi_cmnd.h  |  5 ++++-
>  3 files changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index c736d61b1648..f89e829a1c51 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -199,6 +199,9 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>               return FAILED;
>       }
>  
> +     if (test_and_set_bit(__SCMD_COMPLETE, &scmd->flags))
> +             return SUCCESS;
> +
>       spin_lock_irqsave(shost->host_lock, flags);
>       if (shost->eh_deadline != -1 && !shost->last_reset)
>               shost->last_reset = jiffies;
> @@ -296,20 +299,6 @@ enum blk_eh_timer_return scsi_times_out(struct request
> *req)
>               rtn = host->hostt->eh_timed_out(scmd);
>  
>       if (rtn == BLK_EH_DONE) {
> -             /*
> -              * For blk-mq, we must set the request state to complete
> now
> -              * before sending the request to the scsi error handler.
> This
> -              * will prevent a use-after-free in the event the LLD
> manages
> -              * to complete the request before the error handler
> finishes
> -              * processing this timed out request.
> -              *
> -              * If the request was already completed, then the LLD beat
> the
> -              * time out handler from transferring the request to the
> scsi
> -              * error handler. In that case we can return immediately
> as no
> -              * further action is required.
> -              */
> -             if (req->q->mq_ops && !blk_mq_mark_complete(req))
> -                     return rtn;
>               if (scsi_abort_command(scmd) != SUCCESS) {
>                       set_host_byte(scmd, DID_TIME_OUT);
>                       scsi_eh_scmd_add(scmd);
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index c7fccbb8f554..1e74137f1073 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2044,8 +2044,11 @@ static int scsi_mq_prep_fn(struct request *req)
>  
>  static void scsi_mq_done(struct scsi_cmnd *cmd)
>  {
> +     if (test_and_set_bit(__SCMD_COMPLETE, &cmd->flags))
> +             return;
>       trace_scsi_dispatch_cmd_done(cmd);
> -     blk_mq_complete_request(cmd->request);
> +     if (unlikely(!blk_mq_complete_request(cmd->request)))
> +             clear_bit(__SCMD_COMPLETE, &cmd->flags);
>  }
>  
>  static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
> @@ -2104,6 +2107,7 @@ static blk_status_t scsi_queue_rq(struct
> blk_mq_hw_ctx *hctx,
>                       goto out_dec_host_busy;
>               req->rq_flags |= RQF_DONTPREP;
>       } else {
> +             cmd->flags &= ~SCMD_COMPLETE;
>               blk_mq_start_request(req);
>       }


Hi Keith,

Please Cc Martin Petersen and the scsi mailing list for SCSI patches.

Regarding this patch: I think this patch introduces a subtle but severe bug
in the SCSI core, namely that if an abort is processed concurrently with
request completion with "fake timeout" enabled that the abort is ignored.

Bart.

Reply via email to