On 6/8/26 21:25, Danilo Krummrich wrote:
> On Mon Jun 8, 2026 at 8:47 PM CEST, Christian König wrote:
>> On 6/8/26 20:39, Danilo Krummrich wrote:
>>> On Mon Jun 8, 2026 at 8:32 PM CEST, Christian König wrote:
>>>> On 6/8/26 19:59, Danilo Krummrich wrote:
>>>>> On Mon Jun 8, 2026 at 7:34 PM CEST, Christian König wrote:
>>>>>> That's why we need the RCU grace period to make sure that nobody is
>>>>>> referencing the driver stuff any more.
>>>>>
>>>>> Right, and that's what Philipp tries to address, the requirement to wait 
>>>>> for an
>>>>> RCU grace period is perfectly fine if it is only about freeing memory, 
>>>>> but it
>>>>> can become painful if the fence private data contains data also needs to 
>>>>> be
>>>>> destructed in some way.
>>>>
>>>> Yeah that makes sense.
>>>>
>>>>> IOW, if a driver signals a fence, it is lifecycle-wise reasonable to 
>>>>> destruct
>>>>> the private data that is no longer needed (remaining users only deal with 
>>>>> struct
>>>>> dma_fence) and having to wait for a full grace period adds sublety and
>>>>> complication that can be avoided with the proposed approach.
>>>>
>>>> Yeah, I've run into that when I tried to make the amdgpu fences 
>>>> independent as well.
>>>>> That said, I'd like to ask the opposite question: What are the concerns 
>>>>> with the
>>>>> proposed approach over (pure) RCU?
>>>>
>>>> Well a) locking inversions and b) performance.
>>>>
>>>> For example the reason why we have the dma_fence_is_signaled() and
>>>> dma_fence_is_signaled_locked() variants is because there is a measurable
>>>> difference in some specific use cases for not grabbing the locks.
>>>
>>> I checked for this as well, but couldn't find a case where
>>> dma_fence_is_signaled() is used in a way where it would be performance 
>>> critical
>>> to avoid the lock in any way.
>>>
>>> Note that the lock is only bypassed when the fence is signaled already (this
>>> would be preserved) and if signaled() returns false, i.e. dma_fence_signal()
>>> will take the lock anyways.
>>>
>>>> I personally find those micro-optimizations rather questionable, but the
>>>> community agreement is that we should have them.
>>>
>>> I agree, it is rather questionable. So, I wouldn't make this the deciding 
>>> factor
>>> unless someone can present a valid case where it actually matters.
>>>
>>>> So my take would rather be that the dma_fence_is_signaled_locked() variant
>>>> goes away and we consistently call the ops pointers without holding the
>>>> dma_fence lock and the driver implementations can then optionally take it 
>>>> if
>>>> necessary.
>>>
>>> How did you get to this conclusion considering that you run into what I
>>> mentioned above as well and the fact that we seem to agree that the 
>>> performance
>>> concern is rather questionable?
>>
>> Quite simple, it's the cleaner approach.
> 
> I would maybe agree iff the RCU read side critical section wouldn't be needed
> and we wouldn't need to deal with the consequences of having to defer
> everything.
> 
> And so far it seems to me that there isn't really any other reason that the
> performance concern we both don't buy into.
> 
>> Calling callbacks with locks held is rather questionable even putting the
>> performance issue aside.
> 
> In general, I don't think that more flexibility for drivers is automatically
> always superior.
> 
> Also, before we keep calling it a performance issue, I'd really love to know
> where dma_fence_is_signaled() is called in a case where it returns false and 
> the
> spinlock causes such an overhead that it actually matters.
> 
> (As mentioned above, none of the cases where it returns true would change.)
> 
>> In detail calling the callbacks without holding locks allows all
>> implementations who need it to explicitly take locks in the order they want.
> 
> I don't think this is true in this case.
> 
>   1) The existence of dma_fence_is_signaled_locked() already mandates that all
>      such callbacks must work properly if called with the fence lock held.

For the deadline callback it is mandatory to *not* hold the lock.

The signaled callback can be called with and without holding the lock.

The enable_signaled callback is always called while holding the lock.

> 
>   2) The RCU read side critical section already mandates that driver must not
>      sleep within the callback.
> 
>> If you call it with the lock held you enforce the fence lock the be the
>> outermost lock.
> 
> That's practically already the case, due to dma_fence_is_signaled_locked().

Yeah, but we can fix this and the enable signaling callback.

But the other way around isn't easily possible as far as I can see.

Regards,
Christian.


Reply via email to