Hi Jens, Steve, Masami,
In high-performance storage environments, particularly when utilising RAID
controllers with shared tag sets (BLK_MQ_F_TAG_HCTX_SHARED), severe latency
spikes can occur when fast devices are starved of available tags.
Currently, diagnosing this specific queue contention requires deploying
dynamic kprobes or inferring sleep states, which lacks a simple,
out-of-the-box diagnostic path.
This short series introduces dedicated, low-overhead observability for tag
exhaustion events in the block layer:
- Patch 1 introduces the "block_rq_tag_wait" tracepoint in the tag
allocation slow-path to capture precise, event-based starvation.
- Patch 2 complements this by exposing "wait_on_hw_tag" and
"wait_on_sched_tag" atomic counters via debugfs for quick,
point-in-time cumulative polling.
Together, these provide storage engineers with zero-configuration
mechanisms to definitively identify shared-tag bottlenecks.
Please let me know your thoughts.
Changes since v2 [1]:
- Added "Reviewed-by:" and "Tested-by:" tags for patch 1
- Evaluate is_sched_tag directly within TP_fast_assign (Steven Rostedt)
- Introduced atomic counters via debugfs
Changes since v1 [2]:
- Improved the description of the trace point (Damien Le Moal)
- Removed the redundant "active requests" (Laurence Oberman)
- Introduced pool-specific starvation tracking
[1]: https://lore.kernel.org/lkml/[email protected]/
[2]: https://lore.kernel.org/lkml/[email protected]/
Aaron Tomlin (2):
blk-mq: add tracepoint block_rq_tag_wait
blk-mq: expose tag starvation counts via debugfs
block/blk-mq-debugfs.c | 56 ++++++++++++++++++++++++++++++++++++
block/blk-mq-debugfs.h | 7 +++++
block/blk-mq-tag.c | 8 ++++++
include/linux/blk-mq.h | 10 +++++++
include/trace/events/block.h | 43 +++++++++++++++++++++++++++
5 files changed, 124 insertions(+)
--
2.51.0