sched: Add enqueue credit limit

Rob Clark Thu, 15 May 2025 12:57:32 -0700

On Thu, May 15, 2025 at 11:56 AM Danilo Krummrich <d...@kernel.org> wrote:
>
> On Thu, May 15, 2025 at 10:40:15AM -0700, Rob Clark wrote:
> > On Thu, May 15, 2025 at 10:30 AM Danilo Krummrich <d...@kernel.org> wrote:
> > >
> > > (Cc: Boris)
> > >
> > > On Thu, May 15, 2025 at 12:22:18PM -0400, Connor Abbott wrote:
> > > > For some context, other drivers have the concept of a "synchronous"
> > > > VM_BIND ioctl which completes immediately, and drivers implement it by
> > > > waiting for the whole thing to finish before returning.
> > >
> > > Nouveau implements sync by issuing a normal async VM_BIND and subsequently
> > > waits for the out-fence synchronously.
> >
> > As Connor mentioned, we'd prefer it to be async rather than blocking,
> > in normal cases, otherwise with drm native context for using native
> > UMD in guest VM, you'd be blocking the single host/VMM virglrender
> > thread.
> >
> > The key is we want to keep it async in the normal cases, and not have
> > weird edge case CTS tests blow up from being _too_ async ;-)
>
> I really wonder why they don't blow up in Nouveau, which also support full
> asynchronous VM_BIND. Mind sharing which tests blow up? :)


Maybe it was 
dEQP-VK.sparse_resources.buffer.ssbo.sparse_residency.buffer_size_2_24,
but I might be mixing that up, I'd have to back out this patch and see
where things blow up, which would take many hours.

There definitely was one where I was seeing >5k VM_BIND jobs pile up,
so absolutely throttling like this is needed.

Part of the VM_BIND for msm series adds some tracepoints for amount of
memory preallocated vs used for each job.  That plus scheduler
tracepoints should let you see how much memory is tied up in
prealloc'd pgtables.  You might not be noticing only because you are
running on a big desktop with lots of RAM ;-)

> > > > But this
> > > > doesn't work for native context, where everything has to be
> > > > asynchronous, so we're trying a new approach where we instead submit
> > > > an asynchronous bind for "normal" (non-sparse/driver internal)
> > > > allocations and only attach its out-fence to the in-fence of
> > > > subsequent submits to other queues.
> > >
> > > This is what nouveau does and I think other drivers like Xe and panthor 
> > > do this
> > > as well.
> >
> > No one has added native context support for these drivers yet
>
> Huh? What exactly do you mean with "native context" then?

It is a way to use native usermode driver in a guest VM, by remoting
at the UAPI level, as opposed to the vk or gl API level.  You can
generally get equal to native performance, but the guest/host boundary
strongly encourages asynchronous to hide the guest->host latency.

https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/693
https://indico.freedesktop.org/event/2/contributions/53/attachments/76/121/XDC2022_%20virtgpu%20drm%20native%20context.pdf

So far there is (merged) support for msm + freedreno/turnip, amdgpu +
radeonsi/radv, with MRs in-flight for i915 and asahi.

BR,
-R

> > > > Once you do this then you need a
> > > > limit like this to prevent memory usage from pending page table
> > > > updates from getting out of control. Other drivers haven't needed this
> > > > yet, but they will when they get native context support.
> > >
> > > What are the cases where you did run into this, i.e. which application in
> > > userspace hit this? Was it the CTS, some game, something else?
> >
> > CTS tests that do weird things with massive # of small bind/unbind.  I
> > wouldn't expect to hit the blocking case in the real world.
>
> As mentioned above, can you please share them? I'd like to play around a bit. 
> :)
>
> - Danilo

Re: [PATCH v4 04/40] drm/sched: Add enqueue credit limit

Reply via email to