On 3/18/26 18:53, Aaron Tomlin wrote: > In high-performance storage environments, particularly when utilising > RAID controllers with shared tag sets (BLK_MQ_F_TAG_HCTX_SHARED), severe > latency spikes can occur when fast devices (SSDs) are starved of hardware > tags when sharing the same blk_mq_tag_set. > > Currently, diagnosing this specific hardware queue contention is > difficult. When a CPU thread exhausts the tag pool, blk_mq_get_tag() > forces the current thread to block uninterruptible via io_schedule(). > While this can be inferred viasched:sched_switch or dynamically > traced by attaching a kprobe to blk_mq_mark_tag_wait(), there is no > dedicated, out-of-the-box observability for this event. > > This patch introduces the block_rq_tag_wait static trace point in the > tag allocation slow-path. It triggers immediately before the thread > yields the CPU, exposing the exact hardware context (hctx) that is > starved, the specific pool experiencing starvation (hardware or software > scheduler), and the total pool depth. > > This provides storage engineers and performance monitoring agents > with a zero-configuration, low-overhead mechanism to definitively > identify shared-tag bottlenecks and tune I/O schedulers or cgroup > throttling accordingly. > > Signed-off-by: Aaron Tomlin<[email protected]> > --- > Changes in v1 [1]: > - Improved the description of the trace point (Damien Le Moal) > - Removed the redundant "active requests" (Laurence Oberman) > - Introduced pool-specific starvation tracking > > [1]:https://lore.kernel.org/lkml/[email protected]/
LGTM. Reviewed-by: Chaitanya Kulkarni <[email protected]> -ck
