[Public]

Thank you for the report and fix.
Some nitpick suggestions but with those addressed, this looks good to me and is
Reviewed-by: Jonathan Kim <[email protected]>

> -----Original Message-----
> From: SHANMUGAM, SRINIVASAN <[email protected]>
> Sent: Friday, February 6, 2026 9:15 AM
> To: Koenig, Christian <[email protected]>; Deucher, Alexander
> <[email protected]>
> Cc: [email protected]; SHANMUGAM, SRINIVASAN
> <[email protected]>; Dan Carpenter
> <[email protected]>; Kim, Jonathan <[email protected]>;
> Kuehling, Felix <[email protected]>
> Subject: [PATCH] drm/amdkfd: Fix watch_id bounds checking in debug address
> watch
>
> The address watch clear code receives watch_id as an unsigned value
> (u32), but some helper functions were using a signed int and checked
> bits by shifting with watch_id.
>
> If a very large watch_id is passed from userspace, it can be converted
> to a negative value.  This can cause invalid shifts and may access
> memory outside the watch_points array.
>
> Fix this by checking that watch_id is within MAX_WATCH_ADDRESSES before
> using it.  Also use BIT(watch_id) to test and clear bits safely.
>
> This keeps the behavior unchanged for valid watch IDs and avoids
> undefined behavior for invalid ones.
>
> Fixes the below:
> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:448
> kfd_dbg_trap_clear_dev_address_watch() error: buffer overflow
> 'pdd->watch_points' 4 <= u32max user_rl='0-3,2147483648-u32max' uncapped
>
> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c
>     433 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device
> *pdd,
>     434                                         uint32_t watch_id)
>     435 {
>     436         int r;
>     437
>     438         if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id))
>
> kfd_dbg_owns_dev_watch_id() doesn't check for negative values so if
> watch_id is larger than INT_MAX it leads to a buffer overflow.
> (Negative shifts are undefined).
>
>     439                 return -EINVAL;
>     440
>     441         if (!pdd->dev->kfd->shared_resources.enable_mes) {
>     442                 r = debug_lock_and_unmap(pdd->dev->dqm);
>     443                 if (r)
>     444                         return r;
>     445         }
>     446
>     447         amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
> --> 448         pdd->watch_points[watch_id] = pdd->dev->kfd2kgd-
> >clear_address_watch(
>     449                                                         
> pdd->dev->adev,
>     450                                                         watch_id);
>
> Fixes: e0f85f4690d0 ("drm/amdkfd: add debug set and clear address watch points
> operation")
> Reported-by: Dan Carpenter <[email protected]>
> Cc: Jonathan Kim <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Alex Deucher <[email protected]>
> Cc: Christian König <[email protected]>
> Signed-off-by: Srinivasan Shanmugam <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 21 ++++++++++-----------
>  1 file changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> index 1dae317858e9..91ede56890d2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> @@ -404,27 +404,24 @@ static int kfd_dbg_get_dev_watch_id(struct
> kfd_process_device *pdd, int *watch_i
>       return -ENOMEM;
>  }
>
> -static void kfd_dbg_clear_dev_watch_id(struct kfd_process_device *pdd, int
> watch_id)
> +static void kfd_dbg_clear_dev_watch_id(struct kfd_process_device *pdd, u32
> watch_id)
>  {
>       spin_lock(&pdd->dev->watch_points_lock);
> -
> -     /* process owns device watch point so safe to clear */
> -     if ((pdd->alloc_watch_ids >> watch_id) & 0x1) {
> -             pdd->alloc_watch_ids &= ~(0x1 << watch_id);
> -             pdd->dev->alloc_watch_ids &= ~(0x1 << watch_id);
> +     if (watch_id < MAX_WATCH_ADDRESSES &&

Maybe EINVAL early test in kfd_dbg_trap_clear_set_address_watch against 
MAX_WATCH_ADDRESS similar to your proposed test in 
kfd_dbg_trap_clear_dev_address_watch?
That way you don't have to do the test in a spinlock for any watch point static 
calls and the two watch point entry calls will be symmetrical in terms of their 
condition checks.

> +         (pdd->alloc_watch_ids & BIT(watch_id))) {
> +             pdd->alloc_watch_ids &= ~BIT(watch_id);
> +             pdd->dev->alloc_watch_ids &= ~BIT(watch_id);
>       }
> -
>       spin_unlock(&pdd->dev->watch_points_lock);
>  }
>
> -static bool kfd_dbg_owns_dev_watch_id(struct kfd_process_device *pdd, int
> watch_id)
> +static bool kfd_dbg_owns_dev_watch_id(struct kfd_process_device *pdd, u32
> watch_id)
>  {
>       bool owns_watch_id = false;
>
>       spin_lock(&pdd->dev->watch_points_lock);
> -     owns_watch_id = watch_id < MAX_WATCH_ADDRESSES &&
> -                     ((pdd->alloc_watch_ids >> watch_id) & 0x1);
> -
> +     if (watch_id < MAX_WATCH_ADDRESSES)
I think you have already tested this with your proposed test below since 
kfd_dbg_trap_clear_dev_address_watch is the only caller for this static 
function.
So you can probably remove the watch_id < MAX_WATCH_ADDRESS condition check 
here.

Jon

> +             owns_watch_id = pdd->alloc_watch_ids & BIT(watch_id);
>       spin_unlock(&pdd->dev->watch_points_lock);
>
>       return owns_watch_id;
> @@ -435,6 +432,8 @@ int kfd_dbg_trap_clear_dev_address_watch(struct
> kfd_process_device *pdd,
>  {
>       int r;
>
> +     if (watch_id >= MAX_WATCH_ADDRESSES)
> +             return -EINVAL;
>       if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id))
>               return -EINVAL;
>
> --
> 2.34.1

Reply via email to