Re: [PATCH v2] drm/amdgpu: always use legacy tlb flush on cyan_skilfish

Felix Kuehling Thu, 14 Sep 2023 07:18:14 -0700

On 2023-09-14 10:02, Christian König wrote:

Am 14.09.23 um 15:59 schrieb Felix Kuehling:
On 2023-09-14 9:39, Christian König wrote:
Is a single legacy flush sufficient to emulate an heavyweight flushas well?
On previous generations we needed to issue at least two legacyflushes for this.
I assume you are referring to the Vega20 XGMI workaround. That is avery different issue. Because PTEs would be cached in L2, we had toalways use a heavy-weight flush that would also flush the L2 cache aswell, and follow that with another legacy flush to deal with raceconditions where stale PTEs could be re-fetched from L2 before the L2flush was complete.
No, we also have another (badly documented) workaround which issues alegacy flush before each heavy weight on some hw generations. See themy TLB flush cleanup patches.
A heavy-weight flush guarantees that there are no more possiblememory accesses using the old PTEs. With physically addressed cacheson GFXv9 that includes a cache flush because the address translationhappened before putting data into the cache. I think the addresstranslation and cache architecture works differently on GFXv10. Somaybe the cache-flush is not required here.
But even then a legacy flush probably allows for in-flight memoryaccesses with old physical addresses to complete after the TLB flush.So there is a small risk of memory corruption that was assumed to notbe accessed by the GPU any more. Or when using IOMMU device isolationit would result in IOMMU faults if the DMA mappings are invalidatedslightly too early.
Mhm, that's quite bad. Any idea how to avoid that?


A few ideas

 * Add an arbitrary delay and hope that it is longer than the FIFOs in
   the HW
 * Execute an atomic operation to memory on some GPU engine that could
   act as a fence, maybe just a RELEASE_MEM on the CP to some writeback
   location would do the job
 * If needed, RELEASE_MEM could also perform a cache flush

Regards,
  Felix

Regards,
Christian.
Regards,
  Felix
And please don't push before getting an rb from Felix as well.

Regards,
Christian.


Am 14.09.23 um 11:23 schrieb Lang Yu:
cyan_skilfish has problems with other flush types.

v2: fix incorrect ternary conditional operator usage.(Yifan)

Signed-off-by: Lang Yu <[email protected]>
Cc: <[email protected]> # v5.15+
---
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++-
  1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.cb/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index d3da13f4c80e..c6d11047169a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -236,7 +236,8 @@ static void gmc_v10_0_flush_vm_hub(structamdgpu_device *adev, uint32_t vmid,
  {
bool use_semaphore = gmc_v10_0_use_invalidate_semaphore(adev,vmhub);
      struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
- u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,flush_type);
+    u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,
+ (adev->asic_type != CHIP_CYAN_SKILLFISH) ?flush_type : 0);
      u32 tmp;
      /* Use register 17 for GART */
      const unsigned int eng = 17;
@@ -331,6 +332,8 @@ static void gmc_v10_0_flush_gpu_tlb(structamdgpu_device *adev, uint32_t vmid,
        int r;
+ flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) ?flush_type : 0;
+
      /* flush hdp cache */
      adev->hdp.funcs->flush_hdp(adev, NULL);
@@ -426,6 +429,8 @@ static intgmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
      struct amdgpu_ring *ring = &adev->gfx.kiq[0].ring;
      struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
+ flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) ?flush_type : 0;
+
      if (amdgpu_emu_mode == 0 && ring->sched.ready) {
          spin_lock(&adev->gfx.kiq[0].ring_lock);
          /* 2 dwords flush + 8 dwords fence */

Re: [PATCH v2] drm/amdgpu: always use legacy tlb flush on cyan_skilfish

Reply via email to