On Sun, Apr 26, 2026 at 10:01:42PM -0400, Aaron Tomlin wrote:
> +/**
> + * blk_mq_debugfs_inc_wait_tags - increment the tag starvation counters
> + * @hctx: hardware context associated with the tag allocation
> + * @is_sched: true if the starved pool is the software scheduler
> + *
> + * Evaluates the exhausted tag pool and safely increments the appropriate
> + * per-cpu debugfs starvation counter.
> + *
> + * Note: The per-cpu pointers are explicitly checked to prevent a NULL
> + * pointer dereference in the event that the system was under heavy memory
> + * pressure and the initial per-cpu allocation failed.
> + */
> +void blk_mq_debugfs_inc_wait_tags(struct blk_mq_hw_ctx *hctx,
> + bool is_sched)
> +{
> + unsigned long __percpu *tags = is_sched ?
> + READ_ONCE(hctx->wait_on_sched_tag) :
> + READ_ONCE(hctx->wait_on_hw_tag);
> +
> + if (likely(tags))
> + this_cpu_inc(*tags);
> +}
Hi Jens,
I have realised that this particular code path, invoked from
blk_mq_get_tag() immediately prior to io_schedule(), is in fact, a
preemptible context. Consequently, utilising this_cpu_inc() here will
invariably trigger a warning when operating under a kernel with
CONFIG_DEBUG_PREEMPT=y enabled.
To rectify this, I intend to transition to the use of raw_cpu_inc(). Given
that this is solely for a debugfs interface, I believe it is far more
prudent to prioritise the mitigation of execution overhead over absolute
statistical precision, should a preemption race occur.
Kind regards,
--
Aaron Tomlin