Re: [PATCH v6 0/2] blk-mq: introduce tag starvation observability

Jens Axboe Mon, 18 May 2026 06:34:40 -0700

On 5/17/26 3:36 PM, Aaron Tomlin wrote:
> Hi Jens, Steve, Masami,
> 
> In high-performance storage environments, particularly when utilising RAID
> controllers with shared tag sets (BLK_MQ_F_TAG_HCTX_SHARED), severe latency
> spikes can occur when fast devices are starved of available tags.
> Currently, diagnosing this specific queue contention requires deploying
> dynamic kprobes or inferring sleep states, which lacks a simple,
> out-of-the-box diagnostic path.
> 
> This short series introduces dedicated, low-overhead observability for tag
> exhaustion events in the block layer:
> 
>   - Patch 1 introduces the "block_rq_tag_wait" tracepoint in the tag
>     allocation slow-path to capture precise, event-based starvation.
> 
>   - Patch 2 complements this by exposing "wait_on_hw_tag" and
>     "wait_on_sched_tag" per-CPU counters via debugfs for quick,
>     point-in-time cumulative polling.
> 
> Together, these provide storage engineers with zero-configuration
> mechanisms to definitively identify shared-tag bottlenecks.


Why not just issue the trace points? Then there's close to zero
overhead, rather than needing to need added counters for this, and the
kernel to keep track. If you just issue the get/put tag kind of traces,
then userspace can keep track. That's what blktrace has done for decades
for things like inflight/queue depth accounting.

IOW, seems to me, this could be done with basically zero kernel
additions outside of perhaps a trace point or two.

-- 
Jens Axboe

Re: [PATCH v6 0/2] blk-mq: introduce tag starvation observability

Reply via email to