On Thu, May 15, 2025 at 07:59:16AM -0700, Rob Clark wrote:

Thanks for the detailed explanation!

> On Thu, May 15, 2025 at 2:00 AM Danilo Krummrich <d...@kernel.org> wrote:
> >
> > On Wed, May 14, 2025 at 10:53:16AM -0700, Rob Clark wrote:
> > > From: Rob Clark <robdcl...@chromium.org>
> > >
> > > Eases migration for drivers where VAs don't hold hard references to
> > > their associated BO, avoiding reference loops.
> > >
> > > In particular, msm uses soft references to optimistically keep around
> > > mappings until the BO is distroyed.  Which obviously won't work if the
> > > VA (the mapping) is holding a reference to the BO.
> >
> > Ick! This is all complicated enough. Allow drivers to bypass the proper
> > reference counting for GEM objects in the context of VM_BO structures seems 
> > like
> > an insane footgun.
> >
> > I don't understand why MSM would need weak references here. Why does msm 
> > need
> > that, but nouveau, Xe, panthor, PowerVR do not?
> 
> Most of those drivers were designed (and had their UABI designed) with
> gpuvm, or at least sparse, in mind from the get go.  I'm not sure
> about nouveau, but I guess it just got lucky that it's UABI semantics
> fit having the VMA hold a reference to the BO.
> 
> Unfortunately, msm pre-dates sparse.. and in the beginning there was
> only a single global VM, multiple VMs was something retrofitted ~6yrs
> (?) back.  For existing msm, the VMA(s) are implicitly torn down when
> the GEM obj is freed.  This won't work with the VMA(s) holding hard
> references to the BO.

Ok, that makes sense to me, but why can't this be changed? I don't see how the
uAPI would be affected, this is just an implementation detail, no?

> When userspace opts-in to "VM_BIND" mode, which it has to do before
> the VM is created, then we don't set this flag, the VMA holds a hard
> reference to the BO as it does with other drivers.  But consider this
> use-case, which is perfectly valid for old (existing) userspace:
> 
> 1) Userspace creates a BO
> 2) Submits rendering referencing the BO
> 3) Immediately closes the BO handle, without waiting for the submit to 
> complete
> 
> In this case, the submit holds a reference to the BO which holds a
> reference to the VMA. 

Can't you just instead create the VMAs, which hold a reference to the VM_BO,
which holds a reference to the BO, then drop the drop the original BO reference
and finally, when everything is completed, remove all VMAs of the VM_BO?

This should do exactly the same *and* be conformant with GPUVM design.

> Everything is torn down gracefully when the
> submit completes.  But if the VMA held a hard reference to the BO then
> you'd have a reference loop.
> 
> So there really is no other way to use gpuvm _and_ maintain backwards
> compatibility with the semantics of the pre-VM_BIND UAPI without this
> flag.

Again, how is this important for maintaining backwards compatibility with the
uAPI? This all seems like a driver internal implementation detail to me.

So, is there a technical reason, or is it more that it would be more effort on
the driver end to rework things accordingly?

> Fortunately DRM_GPUVM_VA_WEAK_REF is minimally intrusive.  Otherwise I
> probably would have had to fork my own copy of gpuvm.
> 
> BR,
> -R

Reply via email to