On Wed, 2018-11-14 at 09:26 -0700, Keith Busch wrote: > The scsi timeout error handling had been directly updating the request > state to prevent a natural completion and error handling from completing > the same request twice. Fix this layering violation by having scsi > control the fate of its commands with scsi owned flags rather than > use blk-mq's. > > Signed-off-by: Keith Busch <[email protected]> > --- > drivers/scsi/scsi_error.c | 17 +++-------------- > drivers/scsi/scsi_lib.c | 6 +++++- > include/scsi/scsi_cmnd.h | 5 ++++- > 3 files changed, 12 insertions(+), 16 deletions(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index c736d61b1648..f89e829a1c51 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -199,6 +199,9 @@ scsi_abort_command(struct scsi_cmnd *scmd) > return FAILED; > } > > + if (test_and_set_bit(__SCMD_COMPLETE, &scmd->flags)) > + return SUCCESS; > + > spin_lock_irqsave(shost->host_lock, flags); > if (shost->eh_deadline != -1 && !shost->last_reset) > shost->last_reset = jiffies; > @@ -296,20 +299,6 @@ enum blk_eh_timer_return scsi_times_out(struct request > *req) > rtn = host->hostt->eh_timed_out(scmd); > > if (rtn == BLK_EH_DONE) { > - /* > - * For blk-mq, we must set the request state to complete > now > - * before sending the request to the scsi error handler. > This > - * will prevent a use-after-free in the event the LLD > manages > - * to complete the request before the error handler > finishes > - * processing this timed out request. > - * > - * If the request was already completed, then the LLD beat > the > - * time out handler from transferring the request to the > scsi > - * error handler. In that case we can return immediately > as no > - * further action is required. > - */ > - if (req->q->mq_ops && !blk_mq_mark_complete(req)) > - return rtn; > if (scsi_abort_command(scmd) != SUCCESS) { > set_host_byte(scmd, DID_TIME_OUT); > scsi_eh_scmd_add(scmd); > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index c7fccbb8f554..1e74137f1073 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -2044,8 +2044,11 @@ static int scsi_mq_prep_fn(struct request *req) > > static void scsi_mq_done(struct scsi_cmnd *cmd) > { > + if (test_and_set_bit(__SCMD_COMPLETE, &cmd->flags)) > + return; > trace_scsi_dispatch_cmd_done(cmd); > - blk_mq_complete_request(cmd->request); > + if (unlikely(!blk_mq_complete_request(cmd->request))) > + clear_bit(__SCMD_COMPLETE, &cmd->flags); > } > > static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx) > @@ -2104,6 +2107,7 @@ static blk_status_t scsi_queue_rq(struct > blk_mq_hw_ctx *hctx, > goto out_dec_host_busy; > req->rq_flags |= RQF_DONTPREP; > } else { > + cmd->flags &= ~SCMD_COMPLETE; > blk_mq_start_request(req); > }
Hi Keith, Please Cc Martin Petersen and the scsi mailing list for SCSI patches. Regarding this patch: I think this patch introduces a subtle but severe bug in the SCSI core, namely that if an abort is processed concurrently with request completion with "fake timeout" enabled that the abort is ignored. Bart.
