On 2025-10-15 16:11, Philip Yang wrote:
In mmu notifier release callback, stop user queues to be safe because
the SVM memory is going to unmap from CPU.
Suggested-by: Felix Kuehling <[email protected]>
Signed-off-by: Philip Yang <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 0341f570f3d1..e2a0ae0394b8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1221,11 +1221,16 @@ static void kfd_process_free_notifier(struct
mmu_notifier *mn)
static void kfd_process_notifier_release_internal(struct kfd_process *p)
{
- int i;
+ int i, r;
cancel_delayed_work_sync(&p->eviction_work);
cancel_delayed_work_sync(&p->restore_work);
+ WARN(debug_evictions, "Evicting pid %d", p->lead_thread->pid);
+ r = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_SVM);
Is there a reason why we can't just call
kfd_process_dequeue_from_all_devices here, and remove that call from
kfd_process_wq_release? We don't need to call this an eviction. The
queues get removed on process termination anyway. We're just doing it a
bit earlier now.
Regards,
Felix
+ if (r)
+ pr_debug("failed %d to quiesce KFD queues\n", r);
+
for (i = 0; i < p->n_pdds; i++) {
struct kfd_process_device *pdd = p->pdds[i];