Hi Ming.
Ming Lei - 15.04.18, 17:43:
> Hi Jens,
>
> This two patches fixes the recently discussed race between completion
> and BLK_EH_RESET_TIMER.
>
> Israel & Martin, this one is a simpler fix on this issue and can
> cover the potencial hang of MQ_RQ_COMPLETE_IN_TIMEOUT request, could
> you test V4 and see if your issue can be fixed?
In replacement of all the three other patches I applied?
- '[PATCH] blk-mq_Directly schedule q->timeout_work when aborting a
request.mbox'
- '[PATCH v2] block: Change a rcu_read_{lock,unlock}_sched() pair into
rcu_read_{lock,unlock}().mbox'
- '[PATCH v4] blk-mq_Fix race conditions in request timeout
handling.mbox'
These patches worked reliably so far both for the hang on boot and error
reading SMART data.
I´d compile a kernel tomorrow or Tuesday I think.
> V4:
> - run synchronize_rcu() once for handling all timed out request
> between .timeout() and the following handling
> - address tj's concern about reorder between blk_add_timer() and
> blk_mq_rq_update_aborted_gstate(req, 0)
>
> V3:
> - before completing rq for BLK_EH_HANDLED, sync with normal
> completion path - make sure rq's state updated as MQ_RQ_IN_FLIGHT
> before completing V2:
> - rename the new flag as MQ_RQ_COMPLETE_IN_TIMEOUT
> - fix lock uses in blk_mq_rq_timed_out
> - document re-order between blk_add_timer() and
> blk_mq_rq_update_aborted_gstate(req, 0)
>
>
> Ming Lei (2):
> blk-mq: set RQF_MQ_TIMEOUT_EXPIRED when the rq's timeout isn't
> handled blk-mq: fix race between complete and BLK_EH_RESET_TIMER
>
> block/blk-mq.c | 120
> +++++++++++++++++++++++++++++++++++++++---------- block/blk-mq.h
> | 1 +
> block/blk-timeout.c | 1 -
> include/linux/blkdev.h | 6 +++
> 4 files changed, 104 insertions(+), 24 deletions(-)
--
Martin