On 6/8/26 21:25, Danilo Krummrich wrote: > On Mon Jun 8, 2026 at 8:47 PM CEST, Christian König wrote: >> On 6/8/26 20:39, Danilo Krummrich wrote: >>> On Mon Jun 8, 2026 at 8:32 PM CEST, Christian König wrote: >>>> On 6/8/26 19:59, Danilo Krummrich wrote: >>>>> On Mon Jun 8, 2026 at 7:34 PM CEST, Christian König wrote: >>>>>> That's why we need the RCU grace period to make sure that nobody is >>>>>> referencing the driver stuff any more. >>>>> >>>>> Right, and that's what Philipp tries to address, the requirement to wait >>>>> for an >>>>> RCU grace period is perfectly fine if it is only about freeing memory, >>>>> but it >>>>> can become painful if the fence private data contains data also needs to >>>>> be >>>>> destructed in some way. >>>> >>>> Yeah that makes sense. >>>> >>>>> IOW, if a driver signals a fence, it is lifecycle-wise reasonable to >>>>> destruct >>>>> the private data that is no longer needed (remaining users only deal with >>>>> struct >>>>> dma_fence) and having to wait for a full grace period adds sublety and >>>>> complication that can be avoided with the proposed approach. >>>> >>>> Yeah, I've run into that when I tried to make the amdgpu fences >>>> independent as well. >>>>> That said, I'd like to ask the opposite question: What are the concerns >>>>> with the >>>>> proposed approach over (pure) RCU? >>>> >>>> Well a) locking inversions and b) performance. >>>> >>>> For example the reason why we have the dma_fence_is_signaled() and >>>> dma_fence_is_signaled_locked() variants is because there is a measurable >>>> difference in some specific use cases for not grabbing the locks. >>> >>> I checked for this as well, but couldn't find a case where >>> dma_fence_is_signaled() is used in a way where it would be performance >>> critical >>> to avoid the lock in any way. >>> >>> Note that the lock is only bypassed when the fence is signaled already (this >>> would be preserved) and if signaled() returns false, i.e. dma_fence_signal() >>> will take the lock anyways. >>> >>>> I personally find those micro-optimizations rather questionable, but the >>>> community agreement is that we should have them. >>> >>> I agree, it is rather questionable. So, I wouldn't make this the deciding >>> factor >>> unless someone can present a valid case where it actually matters. >>> >>>> So my take would rather be that the dma_fence_is_signaled_locked() variant >>>> goes away and we consistently call the ops pointers without holding the >>>> dma_fence lock and the driver implementations can then optionally take it >>>> if >>>> necessary. >>> >>> How did you get to this conclusion considering that you run into what I >>> mentioned above as well and the fact that we seem to agree that the >>> performance >>> concern is rather questionable? >> >> Quite simple, it's the cleaner approach. > > I would maybe agree iff the RCU read side critical section wouldn't be needed > and we wouldn't need to deal with the consequences of having to defer > everything. > > And so far it seems to me that there isn't really any other reason that the > performance concern we both don't buy into. > >> Calling callbacks with locks held is rather questionable even putting the >> performance issue aside. > > In general, I don't think that more flexibility for drivers is automatically > always superior. > > Also, before we keep calling it a performance issue, I'd really love to know > where dma_fence_is_signaled() is called in a case where it returns false and > the > spinlock causes such an overhead that it actually matters. > > (As mentioned above, none of the cases where it returns true would change.) > >> In detail calling the callbacks without holding locks allows all >> implementations who need it to explicitly take locks in the order they want. > > I don't think this is true in this case. > > 1) The existence of dma_fence_is_signaled_locked() already mandates that all > such callbacks must work properly if called with the fence lock held.
For the deadline callback it is mandatory to *not* hold the lock. The signaled callback can be called with and without holding the lock. The enable_signaled callback is always called while holding the lock. > > 2) The RCU read side critical section already mandates that driver must not > sleep within the callback. > >> If you call it with the lock held you enforce the fence lock the be the >> outermost lock. > > That's practically already the case, due to dma_fence_is_signaled_locked(). Yeah, but we can fix this and the enable signaling callback. But the other way around isn't easily possible as far as I can see. Regards, Christian.
