Re: [PATCH v2] drm/amdgpu: always use legacy tlb flush on cyan_skilfish

Christian König Mon, 18 Sep 2023 01:36:52 -0700

Am 15.09.23 um 16:49 schrieb Felix Kuehling:

On 2023-09-15 6:19, Christian König wrote:
Am 15.09.23 um 10:53 schrieb Lang Yu:
On 09/14/ , Felix Kuehling wrote:
On 2023-09-14 10:02, Christian König wrote:
Do we still need to use legacy flush to emulate heavyweight flush
if we don't use SVM? And can I push this now?
Felix needs to decide that. From what I understand the KFD needsheavyweight flushes for secure SVM operation.
Yes. We need to be able to guarantee to the kernel, that the GPU willnot access unmapped memory. There are two strategies in the driver todo this:
 1. Preempt GPU queues (which implies a heavy-weight TLB flush in the
    scheduler firmware)
 2. Invalidate page table entries and flush TLBs
#1 happens during MMU notifiers with XNACK off. #2 happens in MMUnotifiers with XNACK on (not supported on GFX10.x) and when unifiedmemory us munmapped. It's that last part I'm worried about. Whenmemory is munmapped and given back to the OS, we need to be able toguarantee that the GPU won't access it any more. The same is true whenGTT BOs and userptr BOs are freed. After unmapping them from the GPUpage tables, we need a heavy-weight flush. I believe the same shouldapply to the graphics driver, but maybe that's implied through the CSand fence mechanisms that keep memory allocated while the GPU isaccessing it.
A legacy flush has a slim chance of not being sufficient becausememory accesses using old addresses can still be in flight in the GPU.
If heavyweight flushes are buggy papering over that by using legacyflushes is only a mediocre workaround.
I agree. I'd like to avoid half-baked workarounds that will cause moreheadaches later on. I started an internal email thread with Tony tounderstand the requirements for heavy-weight flushes on the affectedGPUs and find a better workaround.

Thanks, then this patch should be put on hold until that stuff iscleared up.


Regards,
Christian.

Regards,
  Felix


Regards,
Christian.


Regards,
Lang

Am 14.09.23 um 15:59 schrieb Felix Kuehling:

On 2023-09-14 9:39, Christian König wrote:

Is a single legacy flush sufficient to emulate an heavyweight
flush as well?

On previous generations we needed to issue at least two legacy
flushes for this.

I assume you are referring to the Vega20 XGMI workaround. That is a
very different issue. Because PTEs would be cached in L2, we had to
always use a heavy-weight flush that would also flush the L2 cache
as well, and follow that with another legacy flush to deal with race
conditions where stale PTEs could be re-fetched from L2 before the
L2 flush was complete.

No, we also have another (badly documented) workaround which issues a

legacy flush before each heavy weight on some hw generations. Seethe my

TLB flush cleanup patches.

A heavy-weight flush guarantees that there are no more possible
memory accesses using the old PTEs. With physically addressed caches
on GFXv9 that includes a cache flush because the address translation
happened before putting data into the cache. I think the address
translation and cache architecture works differently on GFXv10. So
maybe the cache-flush is not required here.

But even then a legacy flush probably allows for in-flight memory
accesses with old physical addresses to complete after the TLB
flush. So there is a small risk of memory corruption that was
assumed to not be accessed by the GPU any more. Or when using IOMMU
device isolation it would result in IOMMU faults if the DMA mappings
are invalidated slightly too early.

Mhm, that's quite bad. Any idea how to avoid that?

A few ideas

* Add an arbitrary delay and hope that it is longer than theFIFOs in

    the HW

* Execute an atomic operation to memory on some GPU engine thatcould act as a fence, maybe just a RELEASE_MEM on the CP to somewriteback

    location would do the job
  * If needed, RELEASE_MEM could also perform a cache flush

Regards,
   Felix

Regards,
Christian.

Regards,
   Felix

And please don't push before getting an rb from Felix as well.

Regards,
Christian.


Am 14.09.23 um 11:23 schrieb Lang Yu:

cyan_skilfish has problems with other flush types.

v2: fix incorrect ternary conditional operator usage.(Yifan)

Signed-off-by: Lang Yu <lang...@amd.com>
Cc: <sta...@vger.kernel.org> # v5.15+
---
   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++-
   1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index d3da13f4c80e..c6d11047169a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -236,7 +236,8 @@ static void
gmc_v10_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t
vmid,
   {
       bool use_semaphore =
gmc_v10_0_use_invalidate_semaphore(adev, vmhub);
       struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
-    u32 inv_req =
hub->vmhub_funcs->get_invalidate_req(vmid, flush_type);
+    u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,
+              (adev->asic_type != CHIP_CYAN_SKILLFISH) ?
flush_type : 0);
       u32 tmp;
       /* Use register 17 for GART */
       const unsigned int eng = 17;
@@ -331,6 +332,8 @@ static void
gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t
vmid,
         int r;
   +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH)
? flush_type : 0;
+
       /* flush hdp cache */
       adev->hdp.funcs->flush_hdp(adev, NULL);
   @@ -426,6 +429,8 @@ static int
gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
       struct amdgpu_ring *ring = &adev->gfx.kiq[0].ring;
       struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
   +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH)
? flush_type : 0;
+
       if (amdgpu_emu_mode == 0 && ring->sched.ready) {
spin_lock(&adev->gfx.kiq[0].ring_lock);
           /* 2 dwords flush + 8 dwords fence */

Re: [PATCH v2] drm/amdgpu: always use legacy tlb flush on cyan_skilfish

Reply via email to