On Thu, 7 Feb 2019 at 16:55, Zachary Hays <[email protected]> wrote:
>
> The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set.
> This generates a rescuer thread for that queue that will trigger when
> the CPU is under heavy load and collect the uncompleted work.
>
> In the case of mmc, this creates the possibility of a deadlock when
> there are multiple partitions on the device as other blk-mq work is
> also run on the same queue. For example:
>
> - worker 0 claims the mmc host to work on partition 1
> - worker 1 attempts to claim the host for partition 2 but has to wait
> for worker 0 to finish
> - worker 0 schedules complete_work to release the host
> - rescuer thread is triggered after time-out and collects the dangling
> work
> - rescuer thread attempts to complete the work in order starting with
> claim host
> - the task to release host is now blocked by a task to claim it and
> will never be called
>
> The above results in multiple hung tasks that lead to failures to
> mount partitions.
>
> Handling complete_work on a separate workqueue avoids this by keeping
> the work completion tasks separate from the other blk-mq work. This
> allows the host to be released without getting blocked by other tasks
> attempting to claim the host.
>
> Signed-off-by: Zachary Hays <[email protected]>
Applied for fixes and by adding a fixes+stable tag.
Thanks Zachary for fixing this non-trivial problem! And thanks also to
Adrian/Christoph for your valuable input to this (feel free to reply
with your reviewed-by tag).
Kind regards
Uffe
> ---
> drivers/mmc/core/block.c | 10 +++++++++-
> include/linux/mmc/card.h | 1 +
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index aef1185f383d..14f3fdb8c6bb 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -2112,7 +2112,7 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq)
> if (waiting)
> wake_up(&mq->wait);
> else
> - kblockd_schedule_work(&mq->complete_work);
> + queue_work(mq->card->complete_wq, &mq->complete_work);
>
> return;
> }
> @@ -2924,6 +2924,13 @@ static int mmc_blk_probe(struct mmc_card *card)
>
> mmc_fixup_device(card, mmc_blk_fixups);
>
> + card->complete_wq = alloc_workqueue("mmc_complete",
> + WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
> + if (unlikely(!card->complete_wq)) {
> + pr_err("Failed to create mmc completion workqueue");
> + return -ENOMEM;
> + }
> +
> md = mmc_blk_alloc(card);
> if (IS_ERR(md))
> return PTR_ERR(md);
> @@ -2987,6 +2994,7 @@ static void mmc_blk_remove(struct mmc_card *card)
> pm_runtime_put_noidle(&card->dev);
> mmc_blk_remove_req(md);
> dev_set_drvdata(&card->dev, NULL);
> + destroy_workqueue(card->complete_wq);
> }
>
> static int _mmc_blk_suspend(struct mmc_card *card)
> diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
> index de7377815b6b..8ef330027b13 100644
> --- a/include/linux/mmc/card.h
> +++ b/include/linux/mmc/card.h
> @@ -308,6 +308,7 @@ struct mmc_card {
> unsigned int nr_parts;
>
> unsigned int bouncesz; /* Bounce buffer size */
> + struct workqueue_struct *complete_wq; /* Private workqueue */
> };
>
> static inline bool mmc_large_sector(struct mmc_card *card)
> --
> 2.7.4
>