On 17/09/2025 11:54, David Rosca wrote:
Hi,
On 17. 09. 25 12:15, Tvrtko Ursulin wrote:
Hi,
On 17/09/2025 10:59, David Rosca wrote:
drm_syncobj_find_fence returns fence chain for timeline syncobjs.
Scheduler expects normal fences as job dependencies to be able to
determine whether the fences come from the same entity or sched
and skip waiting on them.
With fence chain as job dependency, the fence will always be
waited on forcing CPU round-trip before starting the job.
Interesting! I was sending patches to fix this differently last year
or so, by making the scheduler use dma_fence_array for tracking
dependencies and relying on dma_fence_unwrap_merge to unwrap, coalesce
contexts and only keep the latest fence for each. But I did not have a
good story to show for which use cases it helped. So I am curious if
you could share which scenario you found gets an improvement from your
patch?
The scenario I am trying to fix is very simple to reproduce with Vulkan
when using timeline semaphore to sync submissions on the same queue (eg.
each submit waiting on value signaled by previous submit). I have
noticed this issue with FFmpeg Vulkan video code, but it will happen
with any Vulkan app using this pattern.
Still out of curiosity, is the performance loss from the CPU round-trip
something you are able to measure?
Btw your patch is I think fine, so:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com>
But you will probably need Christian to ack it.
Regards,
Tvrtko
Signed-off-by: David Rosca <david.ro...@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/
drm/amd/amdgpu/amdgpu_cs.c
index 2e93d570153c..779c11227a53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -29,6 +29,7 @@
#include <linux/pagemap.h>
#include <linux/sync_file.h>
#include <linux/dma-buf.h>
+#include <linux/dma-fence-unwrap.h>
#include <drm/amdgpu_drm.h>
#include <drm/drm_syncobj.h>
@@ -450,7 +451,8 @@ static int amdgpu_syncobj_lookup_and_add(struct
amdgpu_cs_parser *p,
uint32_t handle, u64 point,
u64 flags)
{
- struct dma_fence *fence;
+ struct dma_fence *fence, *f;
+ struct dma_fence_unwrap iter;
int r;
r = drm_syncobj_find_fence(p->filp, handle, point, flags,
&fence);
@@ -460,7 +462,11 @@ static int amdgpu_syncobj_lookup_and_add(struct
amdgpu_cs_parser *p,
return r;
}
- r = amdgpu_sync_fence(&p->sync, fence, GFP_KERNEL);
+ dma_fence_unwrap_for_each(f, &iter, fence) {
+ if (!r)
+ r = amdgpu_sync_fence(&p->sync, f, GFP_KERNEL);
+ }
+
dma_fence_put(fence);
return r;
}