n Tue, May 20, 2025 at 12:54 PM Danilo Krummrich <d...@kernel.org> wrote: > > On Tue, May 20, 2025 at 09:07:05AM -0700, Rob Clark wrote: > > On Tue, May 20, 2025 at 12:06 AM Danilo Krummrich <d...@kernel.org> wrote: > > > > > > On Thu, May 15, 2025 at 12:56:38PM -0700, Rob Clark wrote: > > > > On Thu, May 15, 2025 at 11:56 AM Danilo Krummrich <d...@kernel.org> > > > > wrote: > > > > > > > > > > On Thu, May 15, 2025 at 10:40:15AM -0700, Rob Clark wrote: > > > > > > On Thu, May 15, 2025 at 10:30 AM Danilo Krummrich <d...@kernel.org> > > > > > > wrote: > > > > > > > > > > > > > > (Cc: Boris) > > > > > > > > > > > > > > On Thu, May 15, 2025 at 12:22:18PM -0400, Connor Abbott wrote: > > > > > > > > For some context, other drivers have the concept of a > > > > > > > > "synchronous" > > > > > > > > VM_BIND ioctl which completes immediately, and drivers > > > > > > > > implement it by > > > > > > > > waiting for the whole thing to finish before returning. > > > > > > > > > > > > > > Nouveau implements sync by issuing a normal async VM_BIND and > > > > > > > subsequently > > > > > > > waits for the out-fence synchronously. > > > > > > > > > > > > As Connor mentioned, we'd prefer it to be async rather than > > > > > > blocking, > > > > > > in normal cases, otherwise with drm native context for using native > > > > > > UMD in guest VM, you'd be blocking the single host/VMM virglrender > > > > > > thread. > > > > > > > > > > > > The key is we want to keep it async in the normal cases, and not > > > > > > have > > > > > > weird edge case CTS tests blow up from being _too_ async ;-) > > > > > > > > > > I really wonder why they don't blow up in Nouveau, which also support > > > > > full > > > > > asynchronous VM_BIND. Mind sharing which tests blow up? :) > > > > > > > > Maybe it was > > > > dEQP-VK.sparse_resources.buffer.ssbo.sparse_residency.buffer_size_2_24, > > > > > > The test above is part of the smoke testing I do for nouveau, but I > > > haven't seen > > > such issues yet for nouveau. > > > > nouveau is probably not using async binds for everything? Or maybe > > I'm just pointing to the wrong test. > > Let me double check later on. > > > > > but I might be mixing that up, I'd have to back out this patch and see > > > > where things blow up, which would take many hours. > > > > > > Well, you said that you never had this issue with "real" workloads, but > > > only > > > with VK CTS, so I really think we should know what we are trying to fix > > > here. > > > > > > We can't just add new generic infrastructure without reasonable and *well > > > understood* justification. > > > > What is not well understood about this? We need to pre-allocate > > memory that we likely don't need for pagetables. > > > > In the worst case, a large # of async PAGE_SIZE binds, you end up > > needing to pre-allocate 3 pgtable pages (4 lvl pgtable) per one page > > of mapping. Queue up enough of those and you can explode your memory > > usage. > > Well, the general principle how this can OOM is well understood, sure. What's > not well understood is how we run in this case. I think we should also > understand what test causes the issue and why other drivers are not affected > (yet).
Once again, it's well understood why other drivers aren't affected. They have both synchronous and asynchronous VM_BINDs in the uabi, and the userspace driver uses synchronous VM_BIND for everything except sparse mappings. For freedreno we tried to change that because async works better for native context, which exposed the pre-existing issue with async VM_BINDs causing the whole system to hang when we run out of memory since more mappings started being async. I think it would be possible in theory for other drivers to forward synchronous VM_BINDs asynchronously to the host as long as the host kernel executes them synchronously, so maybe other drivers won't have a problem with native context support. But it will still be possible to make them fall over if you poke them the right way. Connor > > > > > There definitely was one where I was seeing >5k VM_BIND jobs pile up, > > > > so absolutely throttling like this is needed. > > > > > > I still don't understand why the kernel must throttle this? If userspace > > > uses > > > async VM_BIND, it obviously can't spam the kernel infinitely without > > > running > > > into an OOM case. > > > > It is a valid question about whether the kernel or userspace should be > > the one to do the throttling. > > > > I went for doing it in the kernel because the kernel has better > > knowledge of how much it needs to pre-allocate. > > > > (There is also the side point, that this pre-allocated memory is not > > charged to the calling process from a PoV of memory accounting. So > > with that in mind it seems like a good idea for the kernel to throttle > > memory usage.) > > That's a very valid point, maybe we should investigate in the direction of > addressing this, rather than trying to work around it in the scheduler, where > we > can only set an arbitrary credit limit. > > > > But let's assume we agree that we want to avoid that userspace can ever > > > OOM itself > > > through async VM_BIND, then the proposed solution seems wrong: > > > > > > Do we really want the driver developer to set an arbitrary boundary of a > > > number > > > of jobs that can be submitted before *async* VM_BIND blocks and becomes > > > semi-sync? > > > > > > How do we choose this number of jobs? A very small number to be safe, > > > which > > > scales badly on powerful machines? A large number that scales well on > > > powerful > > > machines, but OOMs on weaker ones? > > > > The way I am using it in msm, the credit amount and limit are in units > > of pre-allocated pages in-flight. I set the enqueue_credit_limit to > > 1024 pages, once there are jobs queued up exceeding that limit, they > > start blocking. > > > > The number of _jobs_ is irrelevant, it is # of pre-alloc'd pages in flight. > > That doesn't make a difference for my question. How do you know 1024 pages is > a > good value? How do we scale for different machines with different > capabilities? > > If you have a powerful machine with lots of memory, we might throttle > userspace > for no reason, no? > > If the machine has very limited resources, it might already be too much?