Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Jason Ekstrand Mon, 03 May 2021 08:01:05 -0700

Sorry for the top-post but there's no good thing to reply to here...

One of the things pointed out to me recently by Daniel Vetter that I
didn't fully understand before is that dma_buf has a very subtle
second requirement beyond finite time completion:  Nothing required
for signaling a dma-fence can allocate memory.  Why?  Because the act
of allocating memory may wait on your dma-fence.  This, as it turns
out, is a massively more strict requirement than finite time
completion and, I think, throws out all of the proposals we have so
far.


Take, for instance, Marek's proposal for userspace involvement with
dma-fence by asking the kernel for a next serial and the kernel
trusting userspace to signal it.  That doesn't work at all if
allocating memory to trigger a dma-fence can blow up.  There's simply
no way for the kernel to trust userspace to not do ANYTHING which
might allocate memory.  I don't even think there's a way userspace can
trust itself there.  It also blows up my plan of moving the fences to
transition boundaries.

Not sure where that leaves us.

--Jason

On Mon, May 3, 2021 at 9:42 AM Alex Deucher <alexdeuc...@gmail.com> wrote:
>
> On Sat, May 1, 2021 at 6:27 PM Marek Olšák <mar...@gmail.com> wrote:
> >
> > On Wed, Apr 28, 2021 at 5:07 AM Michel Dänzer <mic...@daenzer.net> wrote:
> >>
> >> On 2021-04-28 8:59 a.m., Christian König wrote:
> >> > Hi Dave,
> >> >
> >> > Am 27.04.21 um 21:23 schrieb Marek Olšák:
> >> >> Supporting interop with any device is always possible. It depends on 
> >> >> which drivers we need to interoperate with and update them. We've 
> >> >> already found the path forward for amdgpu. We just need to find out how 
> >> >> many other drivers need to be updated and evaluate the cost/benefit 
> >> >> aspect.
> >> >>
> >> >> Marek
> >> >>
> >> >> On Tue, Apr 27, 2021 at 2:38 PM Dave Airlie <airl...@gmail.com 
> >> >> <mailto:airl...@gmail.com>> wrote:
> >> >>
> >> >>     On Tue, 27 Apr 2021 at 22:06, Christian König
> >> >>     <ckoenig.leichtzumer...@gmail.com 
> >> >> <mailto:ckoenig.leichtzumer...@gmail.com>> wrote:
> >> >>     >
> >> >>     > Correct, we wouldn't have synchronization between device with and 
> >> >> without user queues any more.
> >> >>     >
> >> >>     > That could only be a problem for A+I Laptops.
> >> >>
> >> >>     Since I think you mentioned you'd only be enabling this on newer
> >> >>     chipsets, won't it be a problem for A+A where one A is a generation
> >> >>     behind the other?
> >> >>
> >> >
> >> > Crap, that is a good point as well.
> >> >
> >> >>
> >> >>     I'm not really liking where this is going btw, seems like a ill
> >> >>     thought out concept, if AMD is really going down the road of 
> >> >> designing
> >> >>     hw that is currently Linux incompatible, you are going to have to
> >> >>     accept a big part of the burden in bringing this support in to more
> >> >>     than just amd drivers for upcoming generations of gpu.
> >> >>
> >> >
> >> > Well we don't really like that either, but we have no other option as 
> >> > far as I can see.
> >>
> >> I don't really understand what "future hw may remove support for kernel 
> >> queues" means exactly. While the per-context queues can be mapped to 
> >> userspace directly, they don't *have* to be, do they? I.e. the kernel 
> >> driver should be able to either intercept userspace access to the queues, 
> >> or in the worst case do it all itself, and provide the existing 
> >> synchronization semantics as needed?
> >>
> >> Surely there are resource limits for the per-context queues, so the kernel 
> >> driver needs to do some kind of virtualization / multi-plexing anyway, or 
> >> we'll get sad user faces when there's no queue available for <current hot 
> >> game>.
> >>
> >> I'm probably missing something though, awaiting enlightenment. :)
> >
> >
> > The hw interface for userspace is that the ring buffer is mapped to the 
> > process address space alongside a doorbell aperture (4K page) that isn't 
> > real memory, but when the CPU writes into it, it tells the hw scheduler 
> > that there are new GPU commands in the ring buffer. Userspace inserts all 
> > the wait, draw, and signal commands into the ring buffer and then "rings" 
> > the doorbell. It's my understanding that the ring buffer and the doorbell 
> > are always mapped in the same GPU address space as the process, which makes 
> > it very difficult to emulate the current protected ring buffers in the 
> > kernel. The VMID of the ring buffer is also not changeable.
> >
>
> The doorbell does not have to be mapped into the process's GPU virtual
> address space.  The CPU could write to it directly.  Mapping it into
> the GPU's virtual address space would allow you to have a device kick
> off work however rather than the CPU.  E.g., the GPU could kick off
> it's own work or multiple devices could kick off work without CPU
> involvement.
>
> Alex
>
>
> > The hw scheduler doesn't do any synchronization and it doesn't see any 
> > dependencies. It only chooses which queue to execute, so it's really just a 
> > simple queue manager handling the virtualization aspect and not much else.
> >
> > Marek
> > _______________________________________________
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Reply via email to