Re: [PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)

Christian König Sat, 03 Mar 2018 05:38:51 -0800

Am 02.03.2018 um 21:47 schrieb Felix Kuehling:

On 2018-03-02 04:29 AM, Liu, Monk wrote:

In_atomic() isnot encouraged to be used to judge if sleep is possible, see the 
macros of it


#define in_atomic() (preept_count() != 0)

OK. But my point is still that you're not testing the right thing when
you check in_interrupt(). The comment before the in_atomic macro
definition states the limitations and says "do not use in driver code".
Unfortunately it doesn't suggest any alternative. I think in_interrupt
is actually worse, because it misses even more cases than in_atomic.


Thinking about this, Felix seems to be absolutely right.

So we need to revert this patch since you can't reliable detect in adriver if sleeping is allowed or not.


Regards,
Christian.


Regards,
   Felix


/Monk

-----Original Message-----
From: Kuehling, Felix
Sent: 2018年3月1日 23:50
To: amd-gfx@lists.freedesktop.org; Liu, Monk <monk....@amd.com>
Subject: Re: [PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)

On 2018-02-28 02:27 AM, Monk Liu wrote:

sometimes GPU is switched to other VFs and won't swich back soon, so
the kiq reg access will not signal within a short period, instead of
busy waiting a long time(MAX_KEQ_REG_WAIT) and returning TMO we can
istead sleep 5ms and try again later (non irq context)

And since the waiting in kiq_r/weg is busy wait, so MAX_KIQ_REG_WAIT
shouldn't set to a long time, set it to 10ms is more appropriate.

if gpu already in reset state, don't retry the KIQ reg access
otherwise it would always hang because KIQ was already die usually.

v2:
replace schedule() with msleep() for the wait

Change-Id: I8fc807ce85a8d30d2b50153f3f3a6eda344ef994
Signed-off-by: Monk Liu <monk....@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 15 +++++++++++++--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index b832651..1672f5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -22,7 +22,7 @@
   */

#include "amdgpu.h"

-#define MAX_KIQ_REG_WAIT       100000000 /* in usecs */
+#define MAX_KIQ_REG_WAIT       10000 /* in usecs, 10ms */

uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev) { @@ -152,9

+152,14 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, uint32_t 
reg)
        amdgpu_ring_commit(ring);
        spin_unlock_irqrestore(&kiq->ring_lock, flags);

+retry_read:

        r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
        if (r < 1) {
                DRM_ERROR("wait for kiq fence error: %ld\n", r);
+               if (!in_interrupt() && !adev->in_gpu_reset) {

You should check in_atomic here. Because it's invalid to sleep in atomic 
context (e.g. while holding a spin lock) even when not in an interrupt.
This seems to happen a lot for indirect register access, e.g.
soc15_pcie_rreg.

Regards,
   Felix

+                       msleep(5);
+                       goto retry_read;
+               }
                return ~0;
        }
        val = adev->wb.wb[adev->virt.reg_val_offs];
@@ -179,9 +184,15 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, 
uint32_t reg, uint32_t v)
        amdgpu_ring_commit(ring);
        spin_unlock_irqrestore(&kiq->ring_lock, flags);

+retry_write:

        r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
-       if (r < 1)
+       if (r < 1) {
                DRM_ERROR("wait for kiq fence error: %ld\n", r);
+               if (!in_interrupt() && !adev->in_gpu_reset) {
+                       msleep(5);
+                       goto retry_write;
+               }
+       }
  }

/**

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)

Reply via email to