- explicit fencing: Userspace passes around distinct fence objects for
any work going on on the gpu. The kernel doesn't insert any stall of
it's own (except for moving buffer objects around ofc). This is what
Android. This also seems to be what amdgpu is doing within one

No, that is clearly not my understanding of explicit fencing.

Userspace doesn't necessarily need to pass around distinct fence objects
with all of it's protocols and the kernel is still responsible for inserting
stalls whenever an userspace protocol or application requires this

Otherwise you will never be able to use explicit fencing on the Linux
desktop with protocols like DRI2/DRI3.
This is about mixing them. Explicit fencing still means userspace has
an explicit piece, separate from buffers, (either sync_file fd, or a
driver-specific cookie, or similar).

I would expect that every driver in the system waits for all fences of a
reservation object as long as it isn't told otherwise by providing a
distinct fence object with the IOCTL in question.
Yup agreed. This way if your explicitly-fencing driver reads a shared
buffer passed over a protocol that does implicit fencing (like
DRI2/3), then it will work.

The other interop direction is explicitly-fencing driver passes a
buffer to a consumer which expects implicit fencing. In that case you
must attach the right fence to the exclusive slot, but _only_ in that

Ok well sounds like you are close to understand why I can't do exactly this: There simply is no right fence I could attach.

When amdgpu makes the command submissions it doesn't necessarily know that the buffer will be exported and shared with another device later on.

So when the buffer is exported and given to the other device you might have a whole bunch of fences which run concurrently and not in any serial order.

Otherwise you end up stalling your explicitly-fencing userspace,
since implicit fencing doesn't allow more than 1 writer. For amdgpu
one possible way to implement this might be to count how many users a
dma-buf has, and if it's more than just the current context set the
exclusive fence. Or do an uabi revision and let userspace decide (or
at least overwrite it).

I mean I can pick one fence and wait for the rest to finish manually, but that would certainly defeat the whole effort, doesn't it?

I completely agree that you have only 1 writer with implicit fencing, but when you switch from explicit fencing back to implicit fencing you can have multiple ones.

But the current approach in amdgpu_sync.c of declaring a fence as
exclusive after the fact (if owners don't match) just isn't how
reservation_object works. You can of course change that, but that
means you must change all drivers implementing support for implicit
fencing of dma-buf. Fixing amdgpu will be easier ;-)

Well as far as I can see there is no way I can fix amdgpu in this case.

The handling clearly needs to be changed on the receiving side of the reservation objects if I don't completely want to disable concurrent access to BOs in amdgpu.



