AMD General > -----Original Message----- > From: Muhammad Bilal <[email protected]> > Sent: Saturday, May 23, 2026 10:27 AM > To: Kuehling, Felix <[email protected]> > Cc: Deucher, Alexander <[email protected]>; Koenig, Christian > <[email protected]>; [email protected]; [email protected]; amd- > [email protected]; [email protected]; linux- > [email protected]; [email protected]; Muhammad Bilal > <[email protected]> > Subject: [PATCH] drm/amdkfd: fix integer overflow in get_queue_ids() > > get_queue_ids() computes the allocation size as: > > size_t array_size = num_queues * sizeof(uint32_t); > > num_queues is a user-controlled u32 copied directly from the ioctl argument > (args.suspend_queues.num_queues or args.resume_queues.num_queues) > via kfd_ioctl_set_debug_trap() with no prior validation or clamping. > > On 32-bit kernels, size_t is 32 bits wide. A caller supplying num_queues = > 0x40000001 causes the multiplication to silently wrap: > > 0x40000001 * 4 = 0x100000004 -> truncated to 0x4 > > memdup_user() then allocates only 4 bytes. q_array_invalidate() is called > immediately after with the original num_queues value and iterates > 0x40000001 times writing KFD_DBG_QUEUE_INVALID_MASK into the 4-byte > buffer, producing an unbounded heap buffer overflow. > q_array_get_index() in both callers walks the same buffer using the same > unchecked count. > > Both call sites are affected: > - suspend_queues() calls get_queue_ids() unconditionally > - resume_queues() calls it only when usr_queue_id_array is non-NULL > > Both callers already propagate IS_ERR() returns to userspace, so returning > ERR_PTR(-EINVAL) on overflow requires no new error handling. > > The copy_to_user() calls at the tail of both functions also compute > num_queues * sizeof(uint32_t), but are only reachable after a successful > get_queue_ids() return, so they are safe once the allocation is correctly > bounded. > > Fix by replacing the unchecked multiplication with check_mul_overflow(). > Cast num_queues to size_t so all three arguments match the destination type, > avoiding implicit type mismatch on compilers that implement the macro with > typeof() rather than __builtin_mul_overflow() directly. > Add an explicit #include <linux/overflow.h> rather than relying on the > transitive pull through linux/slab.h. > > Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process > queues operation") > Cc: [email protected] > Signed-off-by: Muhammad Bilal <[email protected]>
Thanks for the patch. I think it should already be fixed with this patch: https://lists.freedesktop.org/archives/amd-gfx/2026-May/144364.html Alex > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index e0a31e11f0ff..c08ad718dbd7 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -25,6 +25,7 @@ > #include <linux/ratelimit.h> > #include <linux/printk.h> > #include <linux/slab.h> > +#include <linux/overflow.h> > #include <linux/list.h> > #include <linux/types.h> > #include <linux/bitops.h> > @@ -3308,11 +3309,14 @@ static void copy_context_work_handler(struct > work_struct *work) > > static uint32_t *get_queue_ids(uint32_t num_queues, uint32_t > *usr_queue_id_array) { > - size_t array_size = num_queues * sizeof(uint32_t); > + size_t array_size; > > if (!usr_queue_id_array) > return NULL; > > + if (check_mul_overflow((size_t)num_queues, sizeof(uint32_t), > &array_size)) > + return ERR_PTR(-EINVAL); > + > return memdup_user(usr_queue_id_array, array_size); } > > -- > 2.53.0
