On Wed, Apr 22, 2026 at 02:52:12PM -0400, Aaron Tomlin wrote: > From: Daniel Wagner <[email protected]> > +static void blk_mq_map_fallback(struct blk_mq_queue_map *qmap) > +{ > + unsigned int cpu; > + > + /* > + * Map all CPUs to the first hctx to ensure at least one online > + * CPU is serving it. > + */ > + for_each_possible_cpu(cpu) > + qmap->mq_map[cpu] = 0; > +}
I suspect we should use 'qmap->mq_map[cpu] = qmap->queue_offset' to respect
the specified map's boundaries. For instance, secondary maps may not start
at zero.
> void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
> {
> - const struct cpumask *masks;
> + struct cpumask *masks __free(kfree) = NULL;
> + const struct cpumask *constraint;
> unsigned int queue, cpu, nr_masks;
> + cpumask_var_t active_hctx;
[ ... ]
> + /* Map CPUs to the hardware contexts (hctx) */
> + masks = group_mask_cpus_evenly(qmap->nr_queues, constraint, &nr_masks);
> + if (!masks)
> + goto free_fallback;
According to Documentation/dev-tools/checkpatch.rst, pointers with the
'__free' attribute should be declared at the place of use and
initialisation. However, I suspect we should not use traditional goto error
unwinding with scope-based cleanups in the same function. I propose we drop
the use of __free and free 'masks' during the success code path.
> for (queue = 0; queue < qmap->nr_queues; queue++) {
> - for_each_cpu(cpu, &masks[queue % nr_masks])
> + unsigned int idx = (qmap->queue_offset + queue) % nr_masks;
> +
> + for_each_cpu(cpu, &masks[idx]) {
> qmap->mq_map[cpu] = qmap->queue_offset + queue;
> +
> + if (cpu_online(cpu))
> + cpumask_set_cpu(queue, active_hctx);
> + }
> }
[ ... ]
> +
> + /* Map any unassigned CPU evenly to the hardware contexts (hctx) */
> + queue = cpumask_first(active_hctx);
> + for_each_cpu_andnot(cpu, cpu_possible_mask, constraint) {
> + qmap->mq_map[cpu] = qmap->queue_offset + queue;
> + queue = cpumask_next_wrap(queue, active_hctx);
> + }
> +
> + if (!blk_mq_validate(qmap, active_hctx))
> + goto free_fallback;
> +
There is a potential out-of-bounds write vulnerability here.
The variable active_hctx (of type cpumask_var_t), after the call
zalloc_cpumask_var(), the Linux kernel would have allocated exactly enough
bits to represent the number of CPUs the system is configured to support
(nr_cpumask_bits). For example, nr_cpumask_bits could be set to 16.
Now, in the above loop, we are using active_hctx to track hardware queues
(qmap->nr_queues), passing queue as the index into cpumask_set_cpu().
A modern, high-end NVMe drive can expose 128 hardware queues.
If we plug that drive into a machine with only 16 CPUs:
1. zalloc_cpumask_var() allocates 16 bits (perhaps rounded up to a
standard word/slab size).
2. The loop iterates up to queue = 127.
3. cpumask_set_cpu(127, active_hctx) blindly writes to the 127th bit,
drastically overshooting the allocated memory and corrupting
adjacent slab memory in the kernel heap.
Because active_hctx is tracking hardware queues and not CPUs, we must not
use the cpumask API. Instead, switch it to the kernel's standard bitmap
API so the exact bit-length based on qmap->nr_queues can be dynamically
allocated.
Additionally, if all CPUs in the generated masks happen to be offline, the
bitmap will be empty. In that scenario, the bit-finding function will
return the size of the array, causing the unassigned CPU loop to map those
CPUs out-of-bounds. Checking if the bitmap is empty before that loop would
prevent this.
I will address the above in the next series iteration.
Kind regards,
--
Aaron Tomlin
signature.asc
Description: PGP signature

