AMD General

> -----Original Message-----
> From: Muhammad Bilal <[email protected]>
> Sent: Saturday, May 23, 2026 10:27 AM
> To: Kuehling, Felix <[email protected]>
> Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> <[email protected]>; [email protected]; [email protected]; amd-
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; Muhammad Bilal
> <[email protected]>
> Subject: [PATCH] drm/amdkfd: fix integer overflow in get_queue_ids()
>
> get_queue_ids() computes the allocation size as:
>
>     size_t array_size = num_queues * sizeof(uint32_t);
>
> num_queues is a user-controlled u32 copied directly from the ioctl argument
> (args.suspend_queues.num_queues or args.resume_queues.num_queues)
> via kfd_ioctl_set_debug_trap() with no prior validation or clamping.
>
> On 32-bit kernels, size_t is 32 bits wide.  A caller supplying num_queues =
> 0x40000001 causes the multiplication to silently wrap:
>
>     0x40000001 * 4 = 0x100000004  ->  truncated to 0x4
>
> memdup_user() then allocates only 4 bytes.  q_array_invalidate() is called
> immediately after with the original num_queues value and iterates
> 0x40000001 times writing KFD_DBG_QUEUE_INVALID_MASK into the 4-byte
> buffer, producing an unbounded heap buffer overflow.
> q_array_get_index() in both callers walks the same buffer using the same
> unchecked count.
>
> Both call sites are affected:
> - suspend_queues() calls get_queue_ids() unconditionally
> - resume_queues() calls it only when usr_queue_id_array is non-NULL
>
> Both callers already propagate IS_ERR() returns to userspace, so returning
> ERR_PTR(-EINVAL) on overflow requires no new error handling.
>
> The copy_to_user() calls at the tail of both functions also compute
> num_queues * sizeof(uint32_t), but are only reachable after a successful
> get_queue_ids() return, so they are safe once the allocation is correctly
> bounded.
>
> Fix by replacing the unchecked multiplication with check_mul_overflow().
> Cast num_queues to size_t so all three arguments match the destination type,
> avoiding implicit type mismatch on compilers that implement the macro with
> typeof() rather than __builtin_mul_overflow() directly.
> Add an explicit #include <linux/overflow.h> rather than relying on the
> transitive pull through linux/slab.h.
>
> Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process
> queues operation")
> Cc: [email protected]
> Signed-off-by: Muhammad Bilal <[email protected]>

Thanks for the patch.  I think it should already be fixed with this patch:
https://lists.freedesktop.org/archives/amd-gfx/2026-May/144364.html

Alex

> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e0a31e11f0ff..c08ad718dbd7 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -25,6 +25,7 @@
>  #include <linux/ratelimit.h>
>  #include <linux/printk.h>
>  #include <linux/slab.h>
> +#include <linux/overflow.h>
>  #include <linux/list.h>
>  #include <linux/types.h>
>  #include <linux/bitops.h>
> @@ -3308,11 +3309,14 @@ static void copy_context_work_handler(struct
> work_struct *work)
>
>  static uint32_t *get_queue_ids(uint32_t num_queues, uint32_t
> *usr_queue_id_array)  {
> -     size_t array_size = num_queues * sizeof(uint32_t);
> +     size_t array_size;
>
>       if (!usr_queue_id_array)
>               return NULL;
>
> +     if (check_mul_overflow((size_t)num_queues, sizeof(uint32_t),
> &array_size))
> +             return ERR_PTR(-EINVAL);
> +
>       return memdup_user(usr_queue_id_array, array_size);  }
>
> --
> 2.53.0

Reply via email to