On 17/09/2025 11:54, David Rosca wrote:
Hi,

On 17. 09. 25 12:15, Tvrtko Ursulin wrote:

Hi,

On 17/09/2025 10:59, David Rosca wrote:
drm_syncobj_find_fence returns fence chain for timeline syncobjs.
Scheduler expects normal fences as job dependencies to be able to
determine whether the fences come from the same entity or sched
and skip waiting on them.
With fence chain as job dependency, the fence will always be
waited on forcing CPU round-trip before starting the job.

Interesting! I was sending patches to fix this differently last year or so, by making the scheduler use dma_fence_array for tracking dependencies and relying on dma_fence_unwrap_merge to unwrap, coalesce contexts and only keep the latest fence for each. But I did not have a good story to show for which use cases it helped. So I am curious if you could share which scenario you found gets an improvement from your patch?

The scenario I am trying to fix is very simple to reproduce with Vulkan when using timeline semaphore to sync submissions on the same queue (eg. each submit waiting on value signaled by previous submit). I have noticed this issue with FFmpeg Vulkan video code, but it will happen with any Vulkan app using this pattern.

Still out of curiosity, is the performance loss from the CPU round-trip something you are able to measure?

Btw your patch is I think fine, so:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com>

But you will probably need Christian to ack it.

Regards,

Tvrtko

Signed-off-by: David Rosca <david.ro...@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 10 ++++++++--
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/ drm/amd/amdgpu/amdgpu_cs.c
index 2e93d570153c..779c11227a53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -29,6 +29,7 @@
  #include <linux/pagemap.h>
  #include <linux/sync_file.h>
  #include <linux/dma-buf.h>
+#include <linux/dma-fence-unwrap.h>
    #include <drm/amdgpu_drm.h>
  #include <drm/drm_syncobj.h>
@@ -450,7 +451,8 @@ static int amdgpu_syncobj_lookup_and_add(struct amdgpu_cs_parser *p,
                       uint32_t handle, u64 point,
                       u64 flags)
  {
-    struct dma_fence *fence;
+    struct dma_fence *fence, *f;
+    struct dma_fence_unwrap iter;
      int r;
        r = drm_syncobj_find_fence(p->filp, handle, point, flags, &fence); @@ -460,7 +462,11 @@ static int amdgpu_syncobj_lookup_and_add(struct amdgpu_cs_parser *p,
          return r;
      }
  -    r = amdgpu_sync_fence(&p->sync, fence, GFP_KERNEL);
+    dma_fence_unwrap_for_each(f, &iter, fence) {
+        if (!r)
+            r = amdgpu_sync_fence(&p->sync, f, GFP_KERNEL);
+    }
+
      dma_fence_put(fence);
      return r;
  }


Reply via email to