On Thu, May 15, 2025 at 8:30 AM Danilo Krummrich <d...@kernel.org> wrote: > > On Thu, May 15, 2025 at 07:59:16AM -0700, Rob Clark wrote: > > Thanks for the detailed explanation! > > > On Thu, May 15, 2025 at 2:00 AM Danilo Krummrich <d...@kernel.org> wrote: > > > > > > On Wed, May 14, 2025 at 10:53:16AM -0700, Rob Clark wrote: > > > > From: Rob Clark <robdcl...@chromium.org> > > > > > > > > Eases migration for drivers where VAs don't hold hard references to > > > > their associated BO, avoiding reference loops. > > > > > > > > In particular, msm uses soft references to optimistically keep around > > > > mappings until the BO is distroyed. Which obviously won't work if the > > > > VA (the mapping) is holding a reference to the BO. > > > > > > Ick! This is all complicated enough. Allow drivers to bypass the proper > > > reference counting for GEM objects in the context of VM_BO structures > > > seems like > > > an insane footgun. > > > > > > I don't understand why MSM would need weak references here. Why does msm > > > need > > > that, but nouveau, Xe, panthor, PowerVR do not? > > > > Most of those drivers were designed (and had their UABI designed) with > > gpuvm, or at least sparse, in mind from the get go. I'm not sure > > about nouveau, but I guess it just got lucky that it's UABI semantics > > fit having the VMA hold a reference to the BO. > > > > Unfortunately, msm pre-dates sparse.. and in the beginning there was > > only a single global VM, multiple VMs was something retrofitted ~6yrs > > (?) back. For existing msm, the VMA(s) are implicitly torn down when > > the GEM obj is freed. This won't work with the VMA(s) holding hard > > references to the BO. > > Ok, that makes sense to me, but why can't this be changed? I don't see how the > uAPI would be affected, this is just an implementation detail, no?
It's about the behaviour of the API, there is no explicit VMA creation/destruction in the uAPI. > > When userspace opts-in to "VM_BIND" mode, which it has to do before > > the VM is created, then we don't set this flag, the VMA holds a hard > > reference to the BO as it does with other drivers. But consider this > > use-case, which is perfectly valid for old (existing) userspace: > > > > 1) Userspace creates a BO > > 2) Submits rendering referencing the BO > > 3) Immediately closes the BO handle, without waiting for the submit to > > complete > > > > In this case, the submit holds a reference to the BO which holds a > > reference to the VMA. > > Can't you just instead create the VMAs, which hold a reference to the VM_BO, > which holds a reference to the BO, then drop the drop the original BO > reference > and finally, when everything is completed, remove all VMAs of the VM_BO? Perhaps the submit could hold a ref to the VM_BO instead of the BO to cover that particular case. But for the legacy world, the VMA is implicitly torn down when the BO is freed. Which will never happen if the VM_BO holds a reference to the BO. > This should do exactly the same *and* be conformant with GPUVM design. > > > Everything is torn down gracefully when the > > submit completes. But if the VMA held a hard reference to the BO then > > you'd have a reference loop. > > > > So there really is no other way to use gpuvm _and_ maintain backwards > > compatibility with the semantics of the pre-VM_BIND UAPI without this > > flag. > > Again, how is this important for maintaining backwards compatibility with the > uAPI? This all seems like a driver internal implementation detail to me. > > So, is there a technical reason, or is it more that it would be more effort on > the driver end to rework things accordingly? If there were a way to work without WEAK_REF, it seems like it would be harder and much less of a drop in change. BR, -R > > Fortunately DRM_GPUVM_VA_WEAK_REF is minimally intrusive. Otherwise I > > probably would have had to fork my own copy of gpuvm. > > > > BR, > > -R