On 10/20/2025 12:45 PM, Kim, Jonathan wrote:

[Public]


Why print something under conditions where it’s not needed then?

Could this not be a pr_debug instead?

The number of queues could be quite large.

I don’t see the merit in potentially spamming dmesgs if driver is going to clean up anyways.

I think the issue here is not about print this warning or not. The issue is queues associate with prange should have been destroyed before prange got unmapped.

Regards

Xiaogang

Jon

*From:*Chen, Xiaogang <[email protected]>
*Sent:* Monday, October 20, 2025 1:34 PM
*To:* Kim, Jonathan <[email protected]>; Yang, Philip <[email protected]>; [email protected]
*Cc:* Kuehling, Felix <[email protected]>
*Subject:* Re: [PATCH v3 1/2] drm/amdkfd: Fix false positive queue buffer free warning

On 10/20/2025 11:24 AM, Kim, Jonathan wrote:

    [Public]

    You can see this error message on CTRL-C or gpu reset conditions.

    Those don’t seem like application race conditions to me i.e. the
    app can’t do anything about terminations it can’t see coming.

    It appears any bad actor could spam dmesgs with these error
    messages if they wanted to with the way the driver currently is.

If this warning message comes out due to some unexpected things happened(like ctrl-c) I think user would not be surprised to see some warnings.

Regards

Xiaogang

    Jon

    *From:*amd-gfx <[email protected]>
    <mailto:[email protected]> *On Behalf Of
    *Chen, Xiaogang
    *Sent:* Monday, October 20, 2025 12:00 PM
    *To:* Yang, Philip <[email protected]>
    <mailto:[email protected]>; [email protected]
    *Cc:* Kuehling, Felix <[email protected]>
    <mailto:[email protected]>
    *Subject:* Re: [PATCH v3 1/2] drm/amdkfd: Fix false positive queue
    buffer free warning


        

    *Caution:*This message originated from an External Source. Use
    proper caution when opening attachments, clicking links, or
    responding.

    On 10/20/2025 9:30 AM, Philip Yang wrote:

        Only show warning message if process mm is still alive when queue

        buffer is freed.

        If kfd_lookup_process_by_mm return NULL, means the process is already

        exited and mm is gone, it is fine to free queue buffer.

        Fixes: b049504e211e ("drm/amdkfd: Validate user queue svm memory 
residency")

        Signed-off-by: Philip Yang<[email protected]> 
<mailto:[email protected]>

        ---

          drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++--

          1 file changed, 3 insertions(+), 2 deletions(-)

        diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

        index 4d4a47313f5b..d1b2f8525f80 100644

        --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

        +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

        @@ -2487,7 +2487,9 @@ svm_range_unmap_from_cpu(struct mm_struct *mm, 
struct svm_range *prange,

                  bool unmap_parent;

                  uint32_t i;

        -        if (atomic_read(&prange->queue_refcount)) {

        +        p = kfd_lookup_process_by_mm(mm);

    p->mm is null, not p  because driver set p->mm to null. But
    still prange->queue_refcount is ref of queues from this prange. If
    app unmap this prange app should have destroyed the queues
    associated with this prange already. If not, it is not driver
    issue. I think driver should send this warning anyway to indicate
    there are queues not destroyed by app before app unmap the prange.
    It is an app race condition, not driver.

    Regards

    Xiaogang

        +

        +        if (p && atomic_read(&prange->queue_refcount)) {

                         int r;

                         pr_warn("Freeing queue vital buffer 0x%lx, queue 
evicted\n",

        @@ -2497,7 +2499,6 @@ svm_range_unmap_from_cpu(struct mm_struct *mm, 
struct svm_range *prange,

                                 pr_debug("failed %d to quiesce KFD queues\n", 
r);

                  }

        -        p = kfd_lookup_process_by_mm(mm);

                  if (!p)

                         return;

                  svms = &p->svms;

Reply via email to