On Thu, May 15, 2025 at 07:59:16AM -0700, Rob Clark wrote: Thanks for the detailed explanation!
> On Thu, May 15, 2025 at 2:00 AM Danilo Krummrich <d...@kernel.org> wrote: > > > > On Wed, May 14, 2025 at 10:53:16AM -0700, Rob Clark wrote: > > > From: Rob Clark <robdcl...@chromium.org> > > > > > > Eases migration for drivers where VAs don't hold hard references to > > > their associated BO, avoiding reference loops. > > > > > > In particular, msm uses soft references to optimistically keep around > > > mappings until the BO is distroyed. Which obviously won't work if the > > > VA (the mapping) is holding a reference to the BO. > > > > Ick! This is all complicated enough. Allow drivers to bypass the proper > > reference counting for GEM objects in the context of VM_BO structures seems > > like > > an insane footgun. > > > > I don't understand why MSM would need weak references here. Why does msm > > need > > that, but nouveau, Xe, panthor, PowerVR do not? > > Most of those drivers were designed (and had their UABI designed) with > gpuvm, or at least sparse, in mind from the get go. I'm not sure > about nouveau, but I guess it just got lucky that it's UABI semantics > fit having the VMA hold a reference to the BO. > > Unfortunately, msm pre-dates sparse.. and in the beginning there was > only a single global VM, multiple VMs was something retrofitted ~6yrs > (?) back. For existing msm, the VMA(s) are implicitly torn down when > the GEM obj is freed. This won't work with the VMA(s) holding hard > references to the BO. Ok, that makes sense to me, but why can't this be changed? I don't see how the uAPI would be affected, this is just an implementation detail, no? > When userspace opts-in to "VM_BIND" mode, which it has to do before > the VM is created, then we don't set this flag, the VMA holds a hard > reference to the BO as it does with other drivers. But consider this > use-case, which is perfectly valid for old (existing) userspace: > > 1) Userspace creates a BO > 2) Submits rendering referencing the BO > 3) Immediately closes the BO handle, without waiting for the submit to > complete > > In this case, the submit holds a reference to the BO which holds a > reference to the VMA. Can't you just instead create the VMAs, which hold a reference to the VM_BO, which holds a reference to the BO, then drop the drop the original BO reference and finally, when everything is completed, remove all VMAs of the VM_BO? This should do exactly the same *and* be conformant with GPUVM design. > Everything is torn down gracefully when the > submit completes. But if the VMA held a hard reference to the BO then > you'd have a reference loop. > > So there really is no other way to use gpuvm _and_ maintain backwards > compatibility with the semantics of the pre-VM_BIND UAPI without this > flag. Again, how is this important for maintaining backwards compatibility with the uAPI? This all seems like a driver internal implementation detail to me. So, is there a technical reason, or is it more that it would be more effort on the driver end to rework things accordingly? > Fortunately DRM_GPUVM_VA_WEAK_REF is minimally intrusive. Otherwise I > probably would have had to fork my own copy of gpuvm. > > BR, > -R