Am 03.05.21 um 16:59 schrieb Jason Ekstrand:
Sorry for the top-post but there's no good thing to reply to here...

One of the things pointed out to me recently by Daniel Vetter that I
didn't fully understand before is that dma_buf has a very subtle
second requirement beyond finite time completion:  Nothing required
for signaling a dma-fence can allocate memory.  Why?  Because the act
of allocating memory may wait on your dma-fence.  This, as it turns
out, is a massively more strict requirement than finite time
completion and, I think, throws out all of the proposals we have so
far.

Take, for instance, Marek's proposal for userspace involvement with
dma-fence by asking the kernel for a next serial and the kernel
trusting userspace to signal it.  That doesn't work at all if
allocating memory to trigger a dma-fence can blow up.  There's simply
no way for the kernel to trust userspace to not do ANYTHING which
might allocate memory.  I don't even think there's a way userspace can
trust itself there.  It also blows up my plan of moving the fences to
transition boundaries.

Not sure where that leaves us.

Well at least I was perfectly aware of that :)

I'm currently experimenting with some sample code which would allow implicit sync with user fences.

Not that I'm pushing hard into that directly, but I just want to make clear how simple or complex the whole thing would be.

Christian.


--Jason

On Mon, May 3, 2021 at 9:42 AM Alex Deucher <alexdeuc...@gmail.com> wrote:
On Sat, May 1, 2021 at 6:27 PM Marek Olšák <mar...@gmail.com> wrote:
On Wed, Apr 28, 2021 at 5:07 AM Michel Dänzer <mic...@daenzer.net> wrote:
On 2021-04-28 8:59 a.m., Christian König wrote:
Hi Dave,

Am 27.04.21 um 21:23 schrieb Marek Olšák:
Supporting interop with any device is always possible. It depends on which 
drivers we need to interoperate with and update them. We've already found the 
path forward for amdgpu. We just need to find out how many other drivers need 
to be updated and evaluate the cost/benefit aspect.

Marek

On Tue, Apr 27, 2021 at 2:38 PM Dave Airlie <airl...@gmail.com 
<mailto:airl...@gmail.com>> wrote:

     On Tue, 27 Apr 2021 at 22:06, Christian König
     <ckoenig.leichtzumer...@gmail.com 
<mailto:ckoenig.leichtzumer...@gmail.com>> wrote:
     >
     > Correct, we wouldn't have synchronization between device with and 
without user queues any more.
     >
     > That could only be a problem for A+I Laptops.

     Since I think you mentioned you'd only be enabling this on newer
     chipsets, won't it be a problem for A+A where one A is a generation
     behind the other?

Crap, that is a good point as well.

     I'm not really liking where this is going btw, seems like a ill
     thought out concept, if AMD is really going down the road of designing
     hw that is currently Linux incompatible, you are going to have to
     accept a big part of the burden in bringing this support in to more
     than just amd drivers for upcoming generations of gpu.

Well we don't really like that either, but we have no other option as far as I 
can see.
I don't really understand what "future hw may remove support for kernel queues" 
means exactly. While the per-context queues can be mapped to userspace directly, they 
don't *have* to be, do they? I.e. the kernel driver should be able to either intercept 
userspace access to the queues, or in the worst case do it all itself, and provide the 
existing synchronization semantics as needed?

Surely there are resource limits for the per-context queues, so the kernel driver 
needs to do some kind of virtualization / multi-plexing anyway, or we'll get sad user 
faces when there's no queue available for <current hot game>.

I'm probably missing something though, awaiting enlightenment. :)

The hw interface for userspace is that the ring buffer is mapped to the process address 
space alongside a doorbell aperture (4K page) that isn't real memory, but when the CPU 
writes into it, it tells the hw scheduler that there are new GPU commands in the ring 
buffer. Userspace inserts all the wait, draw, and signal commands into the ring buffer 
and then "rings" the doorbell. It's my understanding that the ring buffer and 
the doorbell are always mapped in the same GPU address space as the process, which makes 
it very difficult to emulate the current protected ring buffers in the kernel. The VMID 
of the ring buffer is also not changeable.

The doorbell does not have to be mapped into the process's GPU virtual
address space.  The CPU could write to it directly.  Mapping it into
the GPU's virtual address space would allow you to have a device kick
off work however rather than the CPU.  E.g., the GPU could kick off
it's own work or multiple devices could kick off work without CPU
involvement.

Alex


The hw scheduler doesn't do any synchronization and it doesn't see any 
dependencies. It only chooses which queue to execute, so it's really just a 
simple queue manager handling the virtualization aspect and not much else.

Marek
_______________________________________________
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to