[AMD Official Use Only - AMD Internal Distribution Only] Yes, you are correct. I wanted to re-trigger the scanning of available engine_id's on restore here: https://codebrowser.dev/linux/linux/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c.html#1652
But that is incorrect. On restore, we should only allow the new SDMA queue with the same sdma_id as before. I will drop this patch. Regards, ~David > -----Original Message----- > From: Kuehling, Felix <felix.kuehl...@amd.com> > Sent: Monday, July 28, 2025 11:58 AM > To: Yat Sin, David <david.yat...@amd.com>; amd-gfx@lists.freedesktop.org > Cc: Bhardwaj, Rajneesh <rajneesh.bhard...@amd.com> > Subject: Re: [PATCH v2 1/2] drm/amdkfd: Restore SDMA queues with engine_id > > On 2025-07-22 13:47, David Yat Sin wrote: > > Add support for checkpoint/restore for SDMA queues of type > > KFD_QUEUE_TYPE_SDMA_BY_ENG_ID. > > > > Signed-off-by: David Yat Sin <david.yat...@amd.com> > > --- > > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + > > drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 9 +++++++++ > > 2 files changed, 10 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > index 67694bcd9464..837da09b5bec 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > @@ -1261,6 +1261,7 @@ struct kfd_criu_queue_priv_data { > > uint32_t doorbell_id; > > uint32_t gws; > > uint32_t sdma_id; > > + uint32_t sdma_engine_id; > > uint32_t eop_ring_buffer_size; > > uint32_t ctx_save_restore_area_size; > > uint32_t ctl_stack_size; > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c > > b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c > > index c643e0ccec52..fe4c48930aad 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c > > @@ -846,6 +846,14 @@ static int criu_checkpoint_queue(struct > > kfd_process_device *pdd, > > > > q_data->sdma_id = q->sdma_id; > > > > + if ((q->properties.type == KFD_QUEUE_TYPE_SDMA || > > + q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) > > + && q->properties.sdma_engine_id) { > > + q_data->type = KFD_QUEUE_TYPE_SDMA_BY_ENG_ID; > > + } > > + > > + q_data->sdma_engine_id = q->properties.sdma_engine_id; > > Is this really needed? Isn't the sdma engine ID implied by the sdma_id? > That should be sufficient to ensure that SDMA queues are restored on the same > engine. > > I think we should never see KFD_QUEUE_TYPE_SDMA_BY_ENG_ID when we > take a CRIU checkpoint because that gets replaced by either > KFD_QUEUE_TYPE_SDMA or KFD_QUEUE_TYPE_SDMA_XGMI in > allocate_sdma_queue. > > Regards, > Felix > > > + > > q_data->eop_ring_buffer_address = > > q->properties.eop_ring_buffer_address; > > > > @@ -972,6 +980,7 @@ static void set_queue_properties_from_criu(struct > queue_properties *qp, > > qp->queue_size = q_data->q_size; > > qp->read_ptr = (uint32_t *) q_data->read_ptr_addr; > > qp->write_ptr = (uint32_t *) q_data->write_ptr_addr; > > + qp->sdma_engine_id = q_data->sdma_engine_id; > > qp->eop_ring_buffer_address = q_data->eop_ring_buffer_address; > > qp->eop_ring_buffer_size = q_data->eop_ring_buffer_size; > > qp->ctx_save_restore_area_address = > > q_data->ctx_save_restore_area_address;