Re: [PATCH] blk-mq: add tracepoint block_rq_tag_wait

Aaron Tomlin Wed, 18 Mar 2026 17:22:36 -0700

On Wed, Mar 18, 2026 at 09:21:23AM -0400, Aaron Tomlin wrote:
> On Wed, Mar 18, 2026 at 08:38:20AM +0900, Damien Le Moal wrote:
> > Looks OK to me, but I have some suggestions below.


Hi Damien, Laurence,

Upon reviewing the source code once more, it becomes apparent that tracking
"active requests" within this specific trace point is essentially redundant.
If a thread is compelled to invoke io_schedule(), it is mathematically
certain that the number of active requests perfectly equals the total
number of tags.

Now, it would almost always print active=0 in the following scenarios:

    1.  "mq-deadline" Scheduler Starvation: The thread sleeps waiting for a
        scheduler tag. Because the request has not been dispatched to
        hardware yet, blk_mq_inc_active_requests() was never called.
        hctx->nr_active is 0.

    2.  NVMe Hardware Starvation, "none" scheduler: The thread sleeps
        waiting for a hardware tag. Because NVMe drives do not share tags,
        blk_mq_inc_active_requests() instantly aborts to save CPU-cycles.
        hctx->nr_active remains 0.

    3.  RAID Hardware Starvation, "none" scheduler: The thread sleeps
        waiting for a shared hardware tag. Because it is HCTX_SHARED, the
        kernel tracks the active requests in
        hctx->queue->nr_active_requests_shared_tags. The local
        hctx->nr_active counter is completely bypassed and remains 0.

Rather than attempting to print the active count, the trace point should be
modified to indicate exactly which pool experienced starvation: the
hardware pool or the software scheduler pool.

I will submit a follow-up patch.


Kind regards,
-- 
Aaron Tomlin

signature.asc
Description: PGP signature

Re: [PATCH] blk-mq: add tracepoint block_rq_tag_wait

Reply via email to