Sorry for the top-post but there's no good thing to reply to here... One of the things pointed out to me recently by Daniel Vetter that I didn't fully understand before is that dma_buf has a very subtle second requirement beyond finite time completion: Nothing required for signaling a dma-fence can allocate memory. Why? Because the act of allocating memory may wait on your dma-fence. This, as it turns out, is a massively more strict requirement than finite time completion and, I think, throws out all of the proposals we have so far.
Take, for instance, Marek's proposal for userspace involvement with dma-fence by asking the kernel for a next serial and the kernel trusting userspace to signal it. That doesn't work at all if allocating memory to trigger a dma-fence can blow up. There's simply no way for the kernel to trust userspace to not do ANYTHING which might allocate memory. I don't even think there's a way userspace can trust itself there. It also blows up my plan of moving the fences to transition boundaries. Not sure where that leaves us. --Jason On Mon, May 3, 2021 at 9:42 AM Alex Deucher <alexdeuc...@gmail.com> wrote: > > On Sat, May 1, 2021 at 6:27 PM Marek Olšák <mar...@gmail.com> wrote: > > > > On Wed, Apr 28, 2021 at 5:07 AM Michel Dänzer <mic...@daenzer.net> wrote: > >> > >> On 2021-04-28 8:59 a.m., Christian König wrote: > >> > Hi Dave, > >> > > >> > Am 27.04.21 um 21:23 schrieb Marek Olšák: > >> >> Supporting interop with any device is always possible. It depends on > >> >> which drivers we need to interoperate with and update them. We've > >> >> already found the path forward for amdgpu. We just need to find out how > >> >> many other drivers need to be updated and evaluate the cost/benefit > >> >> aspect. > >> >> > >> >> Marek > >> >> > >> >> On Tue, Apr 27, 2021 at 2:38 PM Dave Airlie <airl...@gmail.com > >> >> <mailto:airl...@gmail.com>> wrote: > >> >> > >> >> On Tue, 27 Apr 2021 at 22:06, Christian König > >> >> <ckoenig.leichtzumer...@gmail.com > >> >> <mailto:ckoenig.leichtzumer...@gmail.com>> wrote: > >> >> > > >> >> > Correct, we wouldn't have synchronization between device with and > >> >> without user queues any more. > >> >> > > >> >> > That could only be a problem for A+I Laptops. > >> >> > >> >> Since I think you mentioned you'd only be enabling this on newer > >> >> chipsets, won't it be a problem for A+A where one A is a generation > >> >> behind the other? > >> >> > >> > > >> > Crap, that is a good point as well. > >> > > >> >> > >> >> I'm not really liking where this is going btw, seems like a ill > >> >> thought out concept, if AMD is really going down the road of > >> >> designing > >> >> hw that is currently Linux incompatible, you are going to have to > >> >> accept a big part of the burden in bringing this support in to more > >> >> than just amd drivers for upcoming generations of gpu. > >> >> > >> > > >> > Well we don't really like that either, but we have no other option as > >> > far as I can see. > >> > >> I don't really understand what "future hw may remove support for kernel > >> queues" means exactly. While the per-context queues can be mapped to > >> userspace directly, they don't *have* to be, do they? I.e. the kernel > >> driver should be able to either intercept userspace access to the queues, > >> or in the worst case do it all itself, and provide the existing > >> synchronization semantics as needed? > >> > >> Surely there are resource limits for the per-context queues, so the kernel > >> driver needs to do some kind of virtualization / multi-plexing anyway, or > >> we'll get sad user faces when there's no queue available for <current hot > >> game>. > >> > >> I'm probably missing something though, awaiting enlightenment. :) > > > > > > The hw interface for userspace is that the ring buffer is mapped to the > > process address space alongside a doorbell aperture (4K page) that isn't > > real memory, but when the CPU writes into it, it tells the hw scheduler > > that there are new GPU commands in the ring buffer. Userspace inserts all > > the wait, draw, and signal commands into the ring buffer and then "rings" > > the doorbell. It's my understanding that the ring buffer and the doorbell > > are always mapped in the same GPU address space as the process, which makes > > it very difficult to emulate the current protected ring buffers in the > > kernel. The VMID of the ring buffer is also not changeable. > > > > The doorbell does not have to be mapped into the process's GPU virtual > address space. The CPU could write to it directly. Mapping it into > the GPU's virtual address space would allow you to have a device kick > off work however rather than the CPU. E.g., the GPU could kick off > it's own work or multiple devices could kick off work without CPU > involvement. > > Alex > > > > The hw scheduler doesn't do any synchronization and it doesn't see any > > dependencies. It only chooses which queue to execute, so it's really just a > > simple queue manager handling the virtualization aspect and not much else. > > > > Marek > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > _______________________________________________ > mesa-dev mailing list > mesa-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel