Hi Bart

On 4/3/19 6:32 AM, Bart Van Assche wrote:
> Since the callback function called by blk_mq_queue_tag_busy_iter()
> may sleep calling synchronize_rcu() from __blk_mq_update_nr_hw_queues()
> is not sufficient to wait until blk_mq_queue_tag_busy_iter() has
> finished. Instead of making __blk_mq_update_nr_hw_queues() wait until
> q->q_usage_counter == 0 is globally visible, do not iterate over tags
> if the request queue is frozen.
> 
> Cc: Christoph Hellwig <[email protected]>
> Cc: Hannes Reinecke <[email protected]>
> Cc: James Smart <[email protected]>
> Cc: Ming Lei <[email protected]>
> Cc: Jianchao Wang <[email protected]>
> Cc: Keith Busch <[email protected]>
> Cc: Dongli Zhang <[email protected]>
> Cc: <[email protected]>
> Fixes: 530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks") # 
> v4.19.
> Signed-off-by: Bart Van Assche <[email protected]>
> ---
>  block/blk-mq-tag.c | 10 +++++-----
>  block/blk-mq.c     |  5 +----
>  2 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index a4931fc7be8a..89f479634b4d 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -383,14 +383,13 @@ void blk_mq_queue_tag_busy_iter(struct request_queue 
> *q, busy_iter_fn *fn,
>  
>       /*
>        * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
> -      * while the queue is frozen. So we can use q_usage_counter to avoid
> -      * racing with it. __blk_mq_update_nr_hw_queues() uses
> -      * synchronize_rcu() to ensure this function left the critical section
> -      * below.
> +      * while the queue is frozen. Hold q_usage_counter to serialize
> +      * __blk_mq_update_nr_hw_queues() against this function.
>        */
>       if (!percpu_ref_tryget(&q->q_usage_counter))
>               return;
> -
> +     if (atomic_read(&q->mq_freeze_depth))
> +             goto out;

percpu_ref_tryget should be enough here.

If the refcount is not zero, get it and the process of updating nr_hw_queues 
will wait, and we will
be safe.

If the refcount has been zeroed, it says that there is no any requests and 
blk_mq_queue_tag_busy_iter
needn't to do anything, so quit is OK.

Add a atomic_read(&q->mq_freeze_depth) here is unnecessary and may cause 
deadlock.

For example, the recovery process want to freeze and drain the queue, but the 
device has been dead.
We have to depend on timeout work to finish the in-flight requests with 
blk_mq_queue_tag_busy_iter.
But now, it cannot do nothing because this mq_freeze_depth checking.

The synchronize_rcu is indeed unnecessary after we introduce percpu_ref_tryget 
here.

Thanks
Jianchao


>       queue_for_each_hw_ctx(q, hctx, i) {
>               struct blk_mq_tags *tags = hctx->tags;
>  
> @@ -405,6 +404,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, 
> busy_iter_fn *fn,
>                       bt_for_each(hctx, &tags->breserved_tags, fn, priv, 
> true);
>               bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
>       }
> +out:
>       blk_queue_exit(q);
>  }
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 652d0c6d5945..f9fc1536408d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3226,10 +3226,7 @@ static void __blk_mq_update_nr_hw_queues(struct 
> blk_mq_tag_set *set,
>  
>       list_for_each_entry(q, &set->tag_list, tag_set_list)
>               blk_mq_freeze_queue(q);
> -     /*
> -      * Sync with blk_mq_queue_tag_busy_iter.
> -      */
> -     synchronize_rcu();
> +
>       /*
>        * Switch IO scheduler to 'none', cleaning up the data associated
>        * with the previous scheduler. We will switch back once we are done
> 

Reply via email to