[Public]

You can see this error message on CTRL-C or gpu reset conditions.
Those don’t seem like application race conditions to me i.e. the app can’t do 
anything about terminations it can’t see coming.
It appears any bad actor could spam dmesgs with these error messages if they 
wanted to with the way the driver currently is.

Jon


From: amd-gfx <[email protected]> On Behalf Of Chen, 
Xiaogang
Sent: Monday, October 20, 2025 12:00 PM
To: Yang, Philip <[email protected]>; [email protected]
Cc: Kuehling, Felix <[email protected]>
Subject: Re: [PATCH v3 1/2] drm/amdkfd: Fix false positive queue buffer free 
warning

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.



On 10/20/2025 9:30 AM, Philip Yang wrote:

Only show warning message if process mm is still alive when queue

buffer is freed.



If kfd_lookup_process_by_mm return NULL, means the process is already

exited and mm is gone, it is fine to free queue buffer.



Fixes: b049504e211e ("drm/amdkfd: Validate user queue svm memory residency")

Signed-off-by: Philip Yang <[email protected]><mailto:[email protected]>

---

 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++--

 1 file changed, 3 insertions(+), 2 deletions(-)



diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index 4d4a47313f5b..d1b2f8525f80 100644

--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

@@ -2487,7 +2487,9 @@ svm_range_unmap_from_cpu(struct mm_struct *mm, struct 
svm_range *prange,

         bool unmap_parent;

         uint32_t i;



-        if (atomic_read(&prange->queue_refcount)) {

+        p = kfd_lookup_process_by_mm(mm);

p->mm is null, not p  because driver set p->mm to null. But still 
prange->queue_refcount is ref of queues from this prange. If app unmap this 
prange app should have destroyed the queues associated with this prange 
already. If not, it is not driver issue. I think driver should send this 
warning anyway to indicate there are queues not destroyed by app before app 
unmap the prange. It is an app race condition, not driver.

Regards

Xiaogang



+

+        if (p && atomic_read(&prange->queue_refcount)) {

                int r;



                pr_warn("Freeing queue vital buffer 0x%lx, queue evicted\n",

@@ -2497,7 +2499,6 @@ svm_range_unmap_from_cpu(struct mm_struct *mm, struct 
svm_range *prange,

                        pr_debug("failed %d to quiesce KFD queues\n", r);

         }



-        p = kfd_lookup_process_by_mm(mm);

         if (!p)

                return;

         svms = &p->svms;

Reply via email to