Re: [PATCH] drm/amdkfd: fix interrupt spin lock

Christian König Sun, 04 Nov 2018 11:20:33 -0800

Am 02.11.18 um 19:59 schrieb Kuehling, Felix:

On 2018-11-02 9:48 a.m., Christian König wrote:

Vega10 has multiple interrupt rings,

I don't think I've seen your code that implements multiple interrupt
rings. So it's a bit hard to comment.


It's already a while ago, but you actually reviewed it :)

As I understand it, the only way
this could happen is, if the two interrupt rings are handled by
different CPU cores. Alternatively, you may end up calling
kgd2kfd_interrupt outside of interrupt context (e.g. in a kernel
thread), which would invalidate an implicit assumption in
kgd2kfd_interrupt. Can you clarify what's happening here?

Currently the later scenario, I've offloaded ring processing on IH ring1 & 2 to a work item.

But the IH is in theory capable of raising IRQ on different CPU cores aswell, e.g. first halve of ring is processed and acknowledged on CPU1,new interrupt for next entries is raised on CPU2 before CPU1 finishesexecuting the handler.

We currently explicitly prevent that in the amdgpu IH code, but that issomething I would really really like to get rid of.

Are there any other cases where this is assumed in the kgd2kfd_interruptcode?


Christian.


Anyway, this change shouldn't cause any other problems so feel free to
add my Acked-by.

Regards,
   Felix

  so this can be called from multiple
calles at the same time resulting in:

[   71.779334] ================================
[   71.779406] WARNING: inconsistent lock state
[   71.779478] 4.19.0-rc1+ #44 Tainted: G        W
[   71.779565] --------------------------------
[   71.779637] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[   71.779740] kworker/6:1/120 [HC0[0]:SC0[0]:HE1:SE1] takes:
[   71.779832] 00000000ad761971 (&(&kfd->interrupt_lock)->rlock){?...},
at: kgd2kfd_interrupt+0x75/0x100 [amdgpu]
[   71.780058] {IN-HARDIRQ-W} state was registered at:
[   71.780115]   _raw_spin_lock+0x2c/0x40
[   71.780180]   kgd2kfd_interrupt+0x75/0x100 [amdgpu]
[   71.780248]   amdgpu_irq_callback+0x6c/0x150 [amdgpu]
[   71.780315]   amdgpu_ih_process+0x88/0x100 [amdgpu]
[   71.780380]   amdgpu_irq_handler+0x20/0x40 [amdgpu]
[   71.780409]   __handle_irq_event_percpu+0x49/0x2a0
[   71.780436]   handle_irq_event_percpu+0x30/0x70
[   71.780461]   handle_irq_event+0x37/0x60
[   71.780484]   handle_edge_irq+0x83/0x1b0
[   71.780506]   handle_irq+0x1f/0x30
[   71.780526]   do_IRQ+0x53/0x110
[   71.780544]   ret_from_intr+0x0/0x22
[   71.780566]   cpuidle_enter_state+0xaa/0x330
[   71.780591]   do_idle+0x203/0x280
[   71.780610]   cpu_startup_entry+0x6f/0x80
[   71.780634]   start_secondary+0x1b0/0x200
[   71.780657]   secondary_startup_64+0xa4/0xb0

Fix this by always using irq save spin locks.

Signed-off-by: Christian König <[email protected]>
---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index d7e0f41f8edc..c004647c8cb4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -682,6 +682,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void 
*ih_ring_entry)
  {
        uint32_t patched_ihre[KFD_MAX_RING_ENTRY_SIZE];
        bool is_patched = false;
+       unsigned long flags;

if (!kfd->init_complete)

                return;
@@ -691,7 +692,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void 
*ih_ring_entry)
                return;
        }

- spin_lock(&kfd->interrupt_lock);

+       spin_lock_irqsave(&kfd->interrupt_lock, flags);

if (kfd->interrupts_active

            && interrupt_is_wanted(kfd, ih_ring_entry,
@@ -700,7 +701,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void 
*ih_ring_entry)
                                     is_patched ? patched_ihre : ih_ring_entry))
                queue_work(kfd->ih_wq, &kfd->interrupt_work);

- spin_unlock(&kfd->interrupt_lock);

+       spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
  }

int kgd2kfd_quiesce_mm(struct mm_struct *mm)


_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: fix interrupt spin lock

Reply via email to