gpuvm: Add locking helpers

Rob Clark Wed, 18 Jun 2025 15:28:38 -0700

On Wed, Jun 18, 2025 at 3:19 PM Danilo Krummrich <d...@redhat.com> wrote:
>
> On Wed, Jun 18, 2025 at 02:56:37PM -0700, Rob Clark wrote:
> > On Wed, Jun 18, 2025 at 2:23 PM Danilo Krummrich <d...@redhat.com> wrote:
> > >
> > > On Tue, Jun 17, 2025 at 06:43:21AM -0700, Rob Clark wrote:
> > > > On Tue, Jun 17, 2025 at 5:48 AM Rob Clark <rob.cl...@oss.qualcomm.com> 
> > > > wrote:
> > > > >
> > > > > On Tue, Jun 17, 2025 at 2:51 AM Danilo Krummrich <d...@redhat.com> 
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jun 16, 2025 at 03:25:08PM -0700, Rob Clark wrote:
> > > > > > > On Mon, Jun 16, 2025 at 2:39 PM Danilo Krummrich 
> > > > > > > <d...@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Sat, Jun 14, 2025 at 08:03:20AM -0700, Rob Clark wrote:
> > > > > > > > > On Sat, Jun 14, 2025 at 3:39 AM Danilo Krummrich 
> > > > > > > > > <d...@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 13, 2025 at 04:57:03PM -0700, Rob Clark wrote:
> > > > > > > > > > > For UNMAP/REMAP steps we could be needing to lock objects 
> > > > > > > > > > > that are not
> > > > > > > > > > > explicitly listed in the VM_BIND ioctl in order to 
> > > > > > > > > > > tear-down unmapped
> > > > > > > > > > > VAs.  These helpers handle locking/preparing the needed 
> > > > > > > > > > > objects.
> > > > > > > > > >
> > > > > > > > > > Yes, that's a common use-case. I think drivers typically 
> > > > > > > > > > iterate through their
> > > > > > > > > > drm_gpuva_ops to lock those objects.
> > > > > > > > > >
> > > > > > > > > > I had a look at you link [1] and it seems that you keep a 
> > > > > > > > > > list of ops as well by
> > > > > > > > > > calling vm_op_enqueue() with a new struct msm_vm_op from 
> > > > > > > > > > the callbacks.
> > > > > > > > > >
> > > > > > > > > > Please note that for exactly this case there is the 
> > > > > > > > > > op_alloc callback in
> > > > > > > > > > struct drm_gpuvm_ops, such that you can allocate a custom 
> > > > > > > > > > op type (i.e. struct
> > > > > > > > > > msm_vm_op) that embedds a struct drm_gpuva_op.
> > > > > > > > >
> > > > > > > > > I did use drm_gpuvm_sm_xyz_ops_create() in an earlier 
> > > > > > > > > iteration of my
> > > > > > > > > VM_BIND series, but it wasn't quite what I was after.  I 
> > > > > > > > > wanted to
> > > > > > > > > apply the VM updates immediately to avoid issues with a later
> > > > > > > > > map/unmap overlapping an earlier map, which
> > > > > > > > > drm_gpuvm_sm_xyz_ops_create() doesn't really handle.  I'm not 
> > > > > > > > > even
> > > > > > > > > sure why this isn't a problem for other drivers unless 
> > > > > > > > > userspace is
> > > > > > > > > providing some guarantees.
> > > > > > > >
> > > > > > > > The drm_gpuva_ops are usually used in a pattern like this.
> > > > > > > >
> > > > > > > >         vm_bind {
> > > > > > > >                 for_each_vm_bind_operation {
> > > > > >                             drm_gpuvm_sm_xyz_ops_create();
> > > > > > > >                         drm_gpuva_for_each_op {
> > > > > > > >                                 // modify drm_gpuvm's interval 
> > > > > > > > tree
> > > > > > > >                                 // pre-allocate memory
> > > > > > > >                                 // lock and prepare objects
> > > > > > > >                         }
> > > > > > > >                 }
> > > > > > > >
> > > > > > > >                 drm_sched_entity_push_job();
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         run_job {
> > > > > > > >                 for_each_vm_bind_operation {
> > > > > > > >                         drm_gpuva_for_each_op {
> > > > > > > >                                 // modify page tables
> > > > > > > >                         }
> > > > > > > >                 }
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         run_job {
> > > > > > > >                 for_each_vm_bind_operation {
> > > > > > > >                         drm_gpuva_for_each_op {
> > > > > > > >                                 // free page table structures, 
> > > > > > > > if any
> > > > > > > >                                 // free unused pre-allocated 
> > > > > > > > memory
> > > > > > > >                         }
> > > > > > > >                 }
> > > > > > > >         }
> > > > > > > >
> > > > > > > > What did you do instead to get map/unmap overlapping? Even more 
> > > > > > > > interesting,
> > > > > > > > what are you doing now?
> > > > > > >
> > > > > > > From what I can tell, the drivers using 
> > > > > > > drm_gpva_for_each_op()/etc are
> > > > > > > doing drm_gpuva_remove() while iterating the ops list..
> > > > > > > drm_gpuvm_sm_xyz_ops_create() itself does not modify the VM.  So 
> > > > > > > this
> > > > > > > can only really work if you perform one MAP or UNMAP at a time.  
> > > > > > > Or at
> > > > > > > least if you process the VM modifying part of the ops list before
> > > > > > > proceeding to the next op.
> > > > > >
> > > > > > (Added the drm_gpuvm_sm_xyz_ops_create() step above.)
> > > > > >
> > > > > > I went through the code you posted [1] and conceptually you're 
> > > > > > implementing
> > > > > > exactly the pattern I described above, i.e. you do:
> > > > > >
> > > > > >         vm_bind {
> > > > > >                 for_each_vm_bind_operation {
> > > > > >                         drm_gpuvm_sm_xyz_exec_lock();
> > > > > >                 }
> > > > > >
> > > > > >                 for_each_vm_bind_operation {
> > > > > >                         drm_gpuvm_sm_xyz() {
> > > > > >                                 // modify drm_gpuvm's interval tree
> > > > > >                                 // create custom ops
> > > > > >                         }
> > > > > >                 }
> > > > > >
> > > > > >                 drm_sched_entity_push_job();
> > > > > >         }
> > > > > >
> > > > > >         run_job {
> > > > > >                 for_each_vm_bind_operation {
> > > > > >                         for_each_custom_op() {
> > > > > >                                 // do stuff
> > > > > >                         }
> > > > > >                 }
> > > > > >         }
> > > > >
> > > > > Close, but by the time we get to run_job there is just a single list
> > > > > of ops covering all the vm_bind operations:
> > > > >
> > > > >         run_job {
> > > > >                 for_each_custom_op() {
> > > > >                         // do stuff
> > > > >                 }
> > > > >         }
> > > > >
> > > > > rather than a list of va ops per vm_bind op.
> > > > >
> > > > > > However, GPUVM intends to solve your use-case with the following, 
> > > > > > semantically
> > > > > > identical, approach.
> > > > > >
> > > > > >         vm_bind {
> > > > > >                 for_each_vm_bind_operation {
> > > > > >                         drm_gpuvm_sm_xyz_ops_create();
> > > > > >
> > > > > >                         drm_gpuva_for_each_op {
> > > > > >                                 // modify drm_gpuvm's interval tree
> > > > > >                                 // lock and prepare objects (1)
> > > > >
> > > > > I currently decouple lock+pin from VM modification to avoid an error
> > > > > path that leaves the VM partially modified.  Once you add this back
> > > > > in, the va-ops approach isn't simpler, IMHO.
> > > >
> > > > Oh, actually scratch that.. using va-ops, it is not even possible to
> > > > decouple locking/prepare from VM modifications.  So using
> > > > DRM_EXEC_INTERRUPTIBLE_WAIT, for ex, with va-ops list would be an
> > > > actively bad idea.
> > >
> > > Well, you would need to unwind the VM modifications. I think so far this 
> > > hasn't
> > > been an issue for drivers, since they have to unwind VM modifications for 
> > > other
> > > reasons anyways.
> >
> > Only thing that can fail at this point are the drm_gpuvm_sm_xyz()
> > calls[1].  Which should only be for small kmalloc's which should not
> > fail.
>
> But what happens *if* they fail? How do you handle this? If you don't unwind 
> all
> preceeding changes to the GPUVM's interval tree at this point your VM is 
> broken.


Small GFP_KERNEL allocations will recurse into shrinker to reclaim
memory before failing.  _If_ they fail, things are in a pretty bad
shape already, the best thing to do is mark the VM unusable to signal
mesa to close the dev file to tear everything down.

> Glancing at the code, it seems that you're allocating the GPUVA ops. And if 
> that
> fails you just return the error, leaving the VM in a broken state.
>
> What we could do is to implement a helper that calculates the worst case 
> number
> of ops and pre-allocate them, but that's not exactly trivial.

No, we shouldn't add this complexity for something that cannot happen
(or if it does happen, you are in a state where nothing you do other
than tear it all down can improve things).

> drm_gpuvm_sm_{map,unmap}_exec_lock() only really makes sense if we can prevent
> the need to ever unwind, so that's a precondition.
>
> Alternatively, you can also decide to accept that your VM is dead if you can't
> allocate the GPUVA ops, I guess. In this case you'd really have to shut it 
> down
> though, otherwise you likely risk faults in your PT management code.

Correct, we never free backing pages while there is still a mapping to
the GPU.. that is the golden rule!

> > But all the "usual" errors (bad params from userspace,
> > interrupted locking, etc) are dealt with before we begin to modify the
> > VM.  If anything does fail after we start modifying the VM we mark the
> > vm as unusable, similar to a gpu fault.
> >
> > [1] ok, I should put some drm_gpuvm_range_valid() checks earlier in
> > the ioctl, while parsing out and validating args/flags/etc.  I
> > overlooked this.
> >
> > > Do you never need to unwind for other reasons than locking dma_resv and
> > > preparing GEM objects? Are you really sure there's nothing else in the 
> > > critical
> > > path?
> > >
> > > If there really isn't anything, I agree that those helpers have value and 
> > > we
> > > should add them. So, if we do so, please document in detail the 
> > > conditions under
> > > which drm_gpuvm_sm_{map,unmap}_exec_lock() can be called for multiple 
> > > VM_BIND
> > > ops *without* updating GPUVM's interval tree intermediately, including an
> > > example.
> >
> > Ok.. in the function headerdoc comment block?  Or did you have
> > somewhere else in mind?
>
> Yeah, in the function doc-comment.

ack

BR,
-R

Re: [PATCH 2/2] drm/gpuvm: Add locking helpers

Reply via email to