Am 22.09.2016 um 14:26 schrieb Daniel Vetter:
On Thu, Sep 22, 2016 at 12:55 PM, Christian König
<> wrote:
Am 22.09.2016 um 08:36 schrieb Daniel Vetter:
On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote:
For a quick workaround I suggest to just serialize all accesses to BO
with different drivers, but essentially I think it is a perfectly valid
requirement to have multiple writers to one BO.
It is, but it's not possible with implicit sync. If you want parallel
write access to the same shared buffer, you _must_ carry around some
explicit fences. Within amdgpu you can use driver-specific cookies, for
shared buffer we now have sync_file. But multiple writers with implicit
sync simply fundamentally doesn't work. Because you have no idea with
writer, touching the same subrange you want to touch.

You don't need to separate the BO into subranges which are touched by
different engines for allowing multiple writers.

AMD hardware and I'm pretty sure others as well are perfectly capable of
writing to the same memory from multiple engines and even multiple GPUs at
the same time.

For a good hint of what is possible see the public AMD ISA documentation
about atomic operations, but that is only the start of it.

The crux here is that we need to assume that we will have implicit and
explicit sync mixed for backward compatibility.

This implies that we need some mechanism like amdgpu uses in it's sync
implementation where every fence is associated with an owner which denotes
the domain in which implicit sync happens. If you leave this domain you will
automatically run into explicit sync.

Currently we define the borders of this domain in amdgpu on process boundary
to keep things like DRI2/DRI3 working as expected.

I really don't see how you want to solve this with a single explicit fence
for each reservation object. As long as you have multiple concurrently
running operations accessing the same buffer you need to keep one fence for
each operation no matter what.
I can't make sense of what you're saying, and I suspect we put
different meaning to different words. So let me define here:

- implicit fencing: Userspace does not track read/writes to buffers,
but only the kernel does that. This is the assumption DRI2/3 has.
Since synchronization is by necessity on a per-buffer level you can
only have 1 writer. In the kernel the cross-driver interface for this
is struct reservation_object attached to dma-bufs. If you don't fill
out/wait for the exclusive fence in there, you're driver is _not_
doing (cross-device) implicit fencing.

I can confirm that my understanding of implicit fencing is exactly the same as yours.

- explicit fencing: Userspace passes around distinct fence objects for
any work going on on the gpu. The kernel doesn't insert any stall of
it's own (except for moving buffer objects around ofc). This is what
Android. This also seems to be what amdgpu is doing within one

No, that is clearly not my understanding of explicit fencing.

Userspace doesn't necessarily need to pass around distinct fence objects with all of it's protocols and the kernel is still responsible for inserting stalls whenever an userspace protocol or application requires this semantics.

Otherwise you will never be able to use explicit fencing on the Linux desktop with protocols like DRI2/DRI3.

I would expect that every driver in the system waits for all fences of a reservation object as long as it isn't told otherwise by providing a distinct fence object with the IOCTL in question.

amd-gfx mailing list

Reply via email to