On 2026-04-07 03:45, Zhang, Tiantian (Celine) wrote:

[AMD Official Use Only - AMD Internal Distribution Only]


Hi @Yang, Philip <mailto:[email protected]>,

Could you please help to review this patch, thanks a lot~

Best Regards,

Celine Zhang

-----Original Message-----
From: YuanShang Mao (River) <[email protected]>
Sent: Wednesday, April 1, 2026 5:56 PM
To: Yang, Philip <[email protected]>
Cc: Koenig, Christian <[email protected]>; [email protected]; Zhang, Tiantian (Celine) <[email protected]> Subject: RE: [PATCH] drm/amdkfd: check if vm ready in svm map and unmap to gpu

[AMD Official Use Only - AMD Internal Distribution Only]

Hi @Yang, Philip

Could help review this patch?

Thanks

River

-----Original Message-----

From: Koenig, Christian <[email protected] <mailto:[email protected]>>

Sent: Tuesday, March 31, 2026 7:32 PM

To: YuanShang Mao (River) <[email protected] <mailto:[email protected]>>; [email protected] <mailto:[email protected]>; Yang, Philip <[email protected] <mailto:[email protected]>>

Subject: Re: [PATCH] drm/amdkfd: check if vm ready in svm map and unmap to gpu

On 3/26/26 11:36, YuanShang wrote:

> Don't map or unmap svm range to gpu if vm is not ready for updates.

>

> Why: DRM entity may already be killed when the svm worker try to

> update gpu vm.

>

> Signed-off-by: YuanShang <[email protected] <mailto:[email protected]>>

Looks correct to me, but I think somebody else already added those checks.

@Philip is that correct? If not please help reviewing the patch.

Thanks,

Christian.

> ---

> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +++++++++++

>  1 file changed, 11 insertions(+)

>

> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

> index 8167fe642341..7f905a7805fa 100644

> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

> @@ -1366,6 +1366,12 @@ svm_range_unmap_from_gpu(struct amdgpu_device

> *adev, struct amdgpu_vm *vm,

>

>       pr_debug("CPU[0x%llx 0x%llx] -> GPU[0x%llx 0x%llx]\n", start, last,

>               gpu_start, gpu_end);

> +

> +     if (!amdgpu_vm_ready(vm)) {

> +             pr_debug("VM not ready, canceling unmap\n");

> +             return -EINVAL;

> +     }

> +

The change looks fine, but it is unnecessary after checking the details of amdgpu_vm_ready.

It is impossible the "DRM entity may already be killed when the svm worker try to update gpu vm", guessing the svm worker is p->svms.restore_work, svm_range_list_fini cancel the work or wait for it to finish. kfd_process_wq_release does svm_range_list_fini first, then fput(pdd->drm_file) to reduce
the vm refcount, then calls amdgpu_vm_fini, to destroy drm sched entity.

If you see the real issue, please post the dmesg log to help understand.

Regards,
Philip


>       return amdgpu_vm_update_range(adev, vm, false, true, true, false, NULL, gpu_start,

> gpu_end, init_pte_value, 0, 0, NULL, NULL,

> fence); @@ -1443,6 +1449,11 @@

> svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange,

>       pr_debug("svms 0x%p [0x%lx 0x%lx] readonly %d\n", prange->svms,

>                last_start, last_start + npages - 1, readonly);

>

> +     if (!amdgpu_vm_ready(vm)) {

> +             pr_debug("VM not ready, canceling map\n");

> +             return -EINVAL;

> +     }

> +

>       for (i = offset; i < offset + npages; i++) {

>               uint64_t gpu_start;

>               uint64_t gpu_end;

Reply via email to