On 3/20/2024 5:52 PM, Mukul Joshi wrote:
Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.


Destroy the high priority workqueue that handles interrupts
during KFD node cleanup.

Signed-off-by: Mukul Joshi<[email protected]>
---
  drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
index dd3c43c1ad70..9b6b6e882593 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
@@ -104,6 +104,8 @@ void kfd_interrupt_exit(struct kfd_node *node)
          */
         flush_workqueue(node->ih_wq);

+       destroy_workqueue(node->ih_wq);
+

Here I think we should cancel work items that are still in the work queue, not flush workqueue node->ih_wq. In this case the kfd functions have been terminated, there is no way to handle the left work items. That would make work queue flush never finish. I think it is the reason there are orphan kernel tasks.

After cancel left work items we can call destroy_workqueue.

Regards

Xiaogang

         kfifo_free(&node->ih_fifo);
  }

--
2.35.1

Reply via email to