[Public] Thank you for the report and fix. Some nitpick suggestions but with those addressed, this looks good to me and is Reviewed-by: Jonathan Kim <[email protected]>
> -----Original Message----- > From: SHANMUGAM, SRINIVASAN <[email protected]> > Sent: Friday, February 6, 2026 9:15 AM > To: Koenig, Christian <[email protected]>; Deucher, Alexander > <[email protected]> > Cc: [email protected]; SHANMUGAM, SRINIVASAN > <[email protected]>; Dan Carpenter > <[email protected]>; Kim, Jonathan <[email protected]>; > Kuehling, Felix <[email protected]> > Subject: [PATCH] drm/amdkfd: Fix watch_id bounds checking in debug address > watch > > The address watch clear code receives watch_id as an unsigned value > (u32), but some helper functions were using a signed int and checked > bits by shifting with watch_id. > > If a very large watch_id is passed from userspace, it can be converted > to a negative value. This can cause invalid shifts and may access > memory outside the watch_points array. > > Fix this by checking that watch_id is within MAX_WATCH_ADDRESSES before > using it. Also use BIT(watch_id) to test and clear bits safely. > > This keeps the behavior unchanged for valid watch IDs and avoids > undefined behavior for invalid ones. > > Fixes the below: > drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:448 > kfd_dbg_trap_clear_dev_address_watch() error: buffer overflow > 'pdd->watch_points' 4 <= u32max user_rl='0-3,2147483648-u32max' uncapped > > drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c > 433 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device > *pdd, > 434 uint32_t watch_id) > 435 { > 436 int r; > 437 > 438 if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id)) > > kfd_dbg_owns_dev_watch_id() doesn't check for negative values so if > watch_id is larger than INT_MAX it leads to a buffer overflow. > (Negative shifts are undefined). > > 439 return -EINVAL; > 440 > 441 if (!pdd->dev->kfd->shared_resources.enable_mes) { > 442 r = debug_lock_and_unmap(pdd->dev->dqm); > 443 if (r) > 444 return r; > 445 } > 446 > 447 amdgpu_gfx_off_ctrl(pdd->dev->adev, false); > --> 448 pdd->watch_points[watch_id] = pdd->dev->kfd2kgd- > >clear_address_watch( > 449 > pdd->dev->adev, > 450 watch_id); > > Fixes: e0f85f4690d0 ("drm/amdkfd: add debug set and clear address watch points > operation") > Reported-by: Dan Carpenter <[email protected]> > Cc: Jonathan Kim <[email protected]> > Cc: Felix Kuehling <[email protected]> > Cc: Alex Deucher <[email protected]> > Cc: Christian König <[email protected]> > Signed-off-by: Srinivasan Shanmugam <[email protected]> > --- > drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 21 ++++++++++----------- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > index 1dae317858e9..91ede56890d2 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > @@ -404,27 +404,24 @@ static int kfd_dbg_get_dev_watch_id(struct > kfd_process_device *pdd, int *watch_i > return -ENOMEM; > } > > -static void kfd_dbg_clear_dev_watch_id(struct kfd_process_device *pdd, int > watch_id) > +static void kfd_dbg_clear_dev_watch_id(struct kfd_process_device *pdd, u32 > watch_id) > { > spin_lock(&pdd->dev->watch_points_lock); > - > - /* process owns device watch point so safe to clear */ > - if ((pdd->alloc_watch_ids >> watch_id) & 0x1) { > - pdd->alloc_watch_ids &= ~(0x1 << watch_id); > - pdd->dev->alloc_watch_ids &= ~(0x1 << watch_id); > + if (watch_id < MAX_WATCH_ADDRESSES && Maybe EINVAL early test in kfd_dbg_trap_clear_set_address_watch against MAX_WATCH_ADDRESS similar to your proposed test in kfd_dbg_trap_clear_dev_address_watch? That way you don't have to do the test in a spinlock for any watch point static calls and the two watch point entry calls will be symmetrical in terms of their condition checks. > + (pdd->alloc_watch_ids & BIT(watch_id))) { > + pdd->alloc_watch_ids &= ~BIT(watch_id); > + pdd->dev->alloc_watch_ids &= ~BIT(watch_id); > } > - > spin_unlock(&pdd->dev->watch_points_lock); > } > > -static bool kfd_dbg_owns_dev_watch_id(struct kfd_process_device *pdd, int > watch_id) > +static bool kfd_dbg_owns_dev_watch_id(struct kfd_process_device *pdd, u32 > watch_id) > { > bool owns_watch_id = false; > > spin_lock(&pdd->dev->watch_points_lock); > - owns_watch_id = watch_id < MAX_WATCH_ADDRESSES && > - ((pdd->alloc_watch_ids >> watch_id) & 0x1); > - > + if (watch_id < MAX_WATCH_ADDRESSES) I think you have already tested this with your proposed test below since kfd_dbg_trap_clear_dev_address_watch is the only caller for this static function. So you can probably remove the watch_id < MAX_WATCH_ADDRESS condition check here. Jon > + owns_watch_id = pdd->alloc_watch_ids & BIT(watch_id); > spin_unlock(&pdd->dev->watch_points_lock); > > return owns_watch_id; > @@ -435,6 +432,8 @@ int kfd_dbg_trap_clear_dev_address_watch(struct > kfd_process_device *pdd, > { > int r; > > + if (watch_id >= MAX_WATCH_ADDRESSES) > + return -EINVAL; > if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id)) > return -EINVAL; > > -- > 2.34.1
