AMD General Greetings @Alex Deucher
Thanks for the ACK. Waiting for the Reviewed-by: > -----Original Message----- > From: Alex Deucher <[email protected]> > Sent: Thursday, May 28, 2026 3:06 PM > To: Martin, Andrew <[email protected]> > Cc: [email protected] > Subject: Re: [PATCH v1] drm/amdkfd: Fix buffer overflow in SDMA queue > checkpoint/restore on GFX11 > > Caution: This message originated from an External Source. Use proper caution > when opening attachments, clicking links, or responding. > > > On Thu, May 28, 2026 at 1:34 PM Andrew Martin <[email protected]> > wrote: > > > > The v11 MQD manager incorrectly assigned the CP-compute variants of > > checkpoint_mqd/restore_mqd for KFD_MQD_TYPE_SDMA queues. These > > functions use sizeof(struct v11_compute_mqd) (2048 bytes) instead of > > sizeof(struct > > v11_sdma_mqd) (512 bytes), causing a 1536-byte overflow. > > > > During CRIU checkpoint of an SDMA queue on Navi3x: > > - checkpoint_mqd() reads 2048 bytes from a 512-byte SDMA MQD buffer, > > leaking 1536 bytes of adjacent GTT memory to userspace > > > > During CRIU restore: > > - restore_mqd() writes 2048 bytes into a 512-byte SDMA MQD buffer, > > corrupting 1536 bytes of adjacent GTT memory (often the ring buffer > > or neighboring MQDs) > > > > This is a copy-paste regression unique to v11. All other ASIC backends > > (cik, vi, v9, v10, v12) correctly use the SDMA-specific variants. > > > > Add checkpoint_mqd_sdma() and restore_mqd_sdma() functions that > > properly handle the smaller v11_sdma_mqd structure, matching the > > pattern used in other MQD managers. > > > > Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3") > > Assisted-by: Claude:Sonnet 4-5 > > Signed-off-by: Andrew Martin <[email protected]> > > Acked-by: Alex Deucher <[email protected]> > > > --- > > .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 40 > > ++++++++++++++++++- > > 1 file changed, 38 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c > > b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c > > index 4d8cf6008a77..ce0f5e8e5c29 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c > > @@ -355,6 +355,42 @@ static void restore_mqd(struct mqd_manager *mm, > void **mqd, > > qp->is_active = 0; > > } > > > > +static void checkpoint_mqd_sdma(struct mqd_manager *mm, > > + void *mqd, > > + void *mqd_dst, > > + void *ctl_stack_dst) { > > + struct v11_sdma_mqd *m; > > + > > + m = get_sdma_mqd(mqd); > > + > > + memcpy(mqd_dst, m, sizeof(struct v11_sdma_mqd)); } > > + > > +static void restore_mqd_sdma(struct mqd_manager *mm, void **mqd, > > + struct kfd_mem_obj *mqd_mem_obj, uint64_t > > *gart_addr, > > + struct queue_properties *qp, > > + const void *mqd_src, > > + const void *ctl_stack_src, > > + const u32 ctl_stack_size) { > > + uint64_t addr; > > + struct v11_sdma_mqd *m; > > + > > + m = (struct v11_sdma_mqd *) mqd_mem_obj->cpu_ptr; > > + addr = mqd_mem_obj->gpu_addr; > > + > > + memcpy(m, mqd_src, sizeof(*m)); > > + > > + m->sdmax_rlcx_doorbell_offset = > > + qp->doorbell_off << > > + SDMA0_QUEUE0_DOORBELL_OFFSET__OFFSET__SHIFT; > > + > > + *mqd = m; > > + if (gart_addr) > > + *gart_addr = addr; > > + > > + qp->is_active = 0; > > +} > > > > static void init_mqd_hiq(struct mqd_manager *mm, void **mqd, > > struct kfd_mem_obj *mqd_mem_obj, uint64_t > > *gart_addr, @@ -539,8 +575,8 @@ struct mqd_manager > *mqd_manager_init_v11(enum KFD_MQD_TYPE type, > > mqd->update_mqd = update_mqd_sdma; > > mqd->destroy_mqd = kfd_destroy_mqd_sdma; > > mqd->is_occupied = kfd_is_occupied_sdma; > > - mqd->checkpoint_mqd = checkpoint_mqd; > > - mqd->restore_mqd = restore_mqd; > > + mqd->checkpoint_mqd = checkpoint_mqd_sdma; > > + mqd->restore_mqd = restore_mqd_sdma; > > mqd->mqd_size = sizeof(struct v11_sdma_mqd); > > mqd->mqd_stride = kfd_mqd_stride; #if > > defined(CONFIG_DEBUG_FS) > > -- > > 2.43.0 > >
