On Wed, Apr 22, 2026 at 02:52:12PM -0400, Aaron Tomlin wrote:
> From: Daniel Wagner <[email protected]>
> +static void blk_mq_map_fallback(struct blk_mq_queue_map *qmap)
> +{
> +     unsigned int cpu;
> +
> +     /*
> +      * Map all CPUs to the first hctx to ensure at least one online
> +      * CPU is serving it.
> +      */
> +     for_each_possible_cpu(cpu)
> +             qmap->mq_map[cpu] = 0;
> +}

I suspect we should use 'qmap->mq_map[cpu] = qmap->queue_offset' to respect
the specified map's boundaries. For instance, secondary maps may not start
at zero.

>  void blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>  {
> -     const struct cpumask *masks;
> +     struct cpumask *masks __free(kfree) = NULL;
> +     const struct cpumask *constraint;
>       unsigned int queue, cpu, nr_masks;
> +     cpumask_var_t active_hctx;

[ ... ]

> +     /* Map CPUs to the hardware contexts (hctx) */
> +     masks = group_mask_cpus_evenly(qmap->nr_queues, constraint, &nr_masks);
> +     if (!masks)
> +             goto free_fallback;

According to Documentation/dev-tools/checkpatch.rst, pointers with the
'__free' attribute should be declared at the place of use and
initialisation. However, I suspect we should not use traditional goto error
unwinding with scope-based cleanups in the same function. I propose we drop
the use of __free and free 'masks' during the success code path.

>       for (queue = 0; queue < qmap->nr_queues; queue++) {
> -             for_each_cpu(cpu, &masks[queue % nr_masks])
> +             unsigned int idx = (qmap->queue_offset + queue) % nr_masks;
> +
> +             for_each_cpu(cpu, &masks[idx]) {
>                       qmap->mq_map[cpu] = qmap->queue_offset + queue;
> +
> +                     if (cpu_online(cpu))
> +                             cpumask_set_cpu(queue, active_hctx);
> +             }
>       }

[ ... ]

> +
> +     /* Map any unassigned CPU evenly to the hardware contexts (hctx) */
> +     queue = cpumask_first(active_hctx);
> +     for_each_cpu_andnot(cpu, cpu_possible_mask, constraint) {
> +             qmap->mq_map[cpu] = qmap->queue_offset + queue;
> +             queue = cpumask_next_wrap(queue, active_hctx);
> +     }
> +
> +     if (!blk_mq_validate(qmap, active_hctx))
> +             goto free_fallback;
> +

There is a potential out-of-bounds write vulnerability here.

The variable active_hctx (of type cpumask_var_t), after the call
zalloc_cpumask_var(), the Linux kernel would have allocated exactly enough
bits to represent the number of CPUs the system is configured to support
(nr_cpumask_bits). For example, nr_cpumask_bits could be set to 16.

Now, in the above loop, we are using active_hctx to track hardware queues
(qmap->nr_queues), passing queue as the index into cpumask_set_cpu().

A modern, high-end NVMe drive can expose 128 hardware queues.
If we plug that drive into a machine with only 16 CPUs:

    1.  zalloc_cpumask_var() allocates 16 bits (perhaps rounded up to a
        standard word/slab size).

    2.  The loop iterates up to queue = 127.

    3.  cpumask_set_cpu(127, active_hctx) blindly writes to the 127th bit,
        drastically overshooting the allocated memory and corrupting
        adjacent slab memory in the kernel heap.

Because active_hctx is tracking hardware queues and not CPUs, we must not
use the cpumask API. Instead, switch it to the kernel's standard bitmap
API so the exact bit-length based on qmap->nr_queues can be dynamically
allocated.

Additionally, if all CPUs in the generated masks happen to be offline, the
bitmap will be empty. In that scenario, the bit-finding function will
return the size of the array, causing the unassigned CPU loop to map those
CPUs out-of-bounds. Checking if the bitmap is empty before that loop would
prevent this.

I will address the above in the next series iteration.


Kind regards,
-- 
Aaron Tomlin

Attachment: signature.asc
Description: PGP signature

Reply via email to