On Tue, May 20, 2025 at 10:22:54AM -0700, Rob Clark wrote: > On Tue, May 20, 2025 at 9:54 AM Danilo Krummrich <d...@kernel.org> wrote: > > On Tue, May 20, 2025 at 09:07:05AM -0700, Rob Clark wrote: > > > On Tue, May 20, 2025 at 12:06 AM Danilo Krummrich <d...@kernel.org> wrote: > > > > But let's assume we agree that we want to avoid that userspace can ever > > > > OOM itself > > > > through async VM_BIND, then the proposed solution seems wrong: > > > > > > > > Do we really want the driver developer to set an arbitrary boundary of > > > > a number > > > > of jobs that can be submitted before *async* VM_BIND blocks and becomes > > > > semi-sync? > > > > > > > > How do we choose this number of jobs? A very small number to be safe, > > > > which > > > > scales badly on powerful machines? A large number that scales well on > > > > powerful > > > > machines, but OOMs on weaker ones? > > > > > > The way I am using it in msm, the credit amount and limit are in units > > > of pre-allocated pages in-flight. I set the enqueue_credit_limit to > > > 1024 pages, once there are jobs queued up exceeding that limit, they > > > start blocking. > > > > > > The number of _jobs_ is irrelevant, it is # of pre-alloc'd pages in > > > flight. > > > > That doesn't make a difference for my question. How do you know 1024 pages > > is a > > good value? How do we scale for different machines with different > > capabilities? > > > > If you have a powerful machine with lots of memory, we might throttle > > userspace > > for no reason, no? > > > > If the machine has very limited resources, it might already be too much? > > It may be a bit arbitrary, but then again I'm not sure that userspace > is in any better position to pick an appropriate limit. > > 4MB of in-flight pages isn't going to be too much for anything that is > capable enough to run vk, but still allows for a lot of in-flight > maps.
Ok, but what about the other way around? What's the performance impact if the limit is chosen rather small, but we're running on a very powerful machine? Since you already have the implementation for hardware you have access to, can you please check if and how performance degrades when you use a very small threshold? Also, I think we should probably put this throttle mechanism in a separate component, that just wraps a counter of bytes or rather pages that can be increased and decreased through an API and the increase just blocks at a certain threshold. This component can then be called by a driver from the job submit IOCTL and the corresponding place where the pre-allocated memory is actually used / freed. Depending on the driver, this might not necessarily be in the scheduler's run_job() callback. We could call the component something like drm_throttle or drm_submit_throttle.