Hi Jens, Steve, Masami,

In high-performance storage environments, particularly when utilising RAID 
controllers with shared tag sets (BLK_MQ_F_TAG_HCTX_SHARED), severe latency
spikes can occur when fast devices are starved of available tags.
Currently, diagnosing this specific queue contention requires deploying
dynamic kprobes or inferring sleep states, which lacks a simple,
out-of-the-box diagnostic path.

This short series introduces dedicated, low-overhead observability for tag 
exhaustion events in the block layer:

  - Patch 1 introduces the "block_rq_tag_wait" tracepoint in the tag
    allocation slow-path to capture precise, event-based starvation.

  - Patch 2 complements this by exposing "wait_on_hw_tag" and 
    "wait_on_sched_tag" atomic counters via debugfs for quick, 
    point-in-time cumulative polling.

Together, these provide storage engineers with zero-configuration 
mechanisms to definitively identify shared-tag bottlenecks.

Please let me know your thoughts.


Changes since v2 [1]:
 - Added "Reviewed-by:" and "Tested-by:" tags for patch 1
 - Evaluate is_sched_tag directly within TP_fast_assign (Steven Rostedt)
 - Introduced atomic counters via debugfs 

Changes since v1 [2]:
 - Improved the description of the trace point (Damien Le Moal)
 - Removed the redundant "active requests" (Laurence Oberman)
 - Introduced pool-specific starvation tracking

[1]: https://lore.kernel.org/lkml/[email protected]/
[2]: https://lore.kernel.org/lkml/[email protected]/

Aaron Tomlin (2):
  blk-mq: add tracepoint block_rq_tag_wait
  blk-mq: expose tag starvation counts via debugfs

 block/blk-mq-debugfs.c       | 56 ++++++++++++++++++++++++++++++++++++
 block/blk-mq-debugfs.h       |  7 +++++
 block/blk-mq-tag.c           |  8 ++++++
 include/linux/blk-mq.h       | 10 +++++++
 include/trace/events/block.h | 43 +++++++++++++++++++++++++++
 5 files changed, 124 insertions(+)

-- 
2.51.0


Reply via email to