On 2/24/26 11:32, Philipp Stanner wrote:
> On Mon, 2026-02-23 at 12:42 +0100, Christian König wrote:
>> Hi Philip,
>>
>> I only found this message by coincident, please make sure to always CC my 
>> AMD work email address as well.
> 
> You've been the direct recipent, in the To: header field :)

Yeah, but this is just the gmail account for mailing lists which I use as 
workaround because AMDs IT has the problem to mangle mails and so make patch 
review impossible.

>>
>> On 2/19/26 12:06, Philipp Stanner wrote:
>>> Yo Christian,
>>>
>>> I'd like to discuss the dma_fence fast path optimization
>>> (ops.is_signaled) again.
>>>
>>> As far as I understand by now, the use case is that some drivers will
>>> never signal fences; but the consumer of the fence actively polls
>>> whether a fence is signaled or not.
>>>
>>> Right?
>>
>> Close but not 100% right. The semantic is that enabled_signaling is only 
>> called when somebody actively waits for the dma_fence to finish.
>>
>> So as long as both userspace and kernel only poll for the fence status 
>> enable_signaling is never called and only is_signaled is called.
> 
> So you're telling me that enable_signaling enables interrupt-driven
> signaling, typically. IOW in some cases you can request that a specific
> fence gets signaled the expensive way (interrupt) while polling on the
> others.

Correct, yes.

> What is the hw->hw signaling that the documentation details?

Oh, do we still have references to that in the framework documentation?

Initially we tried to make hw->hw signaling a general framework as well, but it 
was quickly found that this is really problematic and removed/never fully 
merged. 

HW->HW signaling is still used by a bunch of DMA-buf implementations, but that 
is then implementation specific.

> hw->sw signaling seems to refer to interrupts.

HW->SW signaling is both interrupt and polling driven.

>>
>> What drivers/fence implementations do with that is up to them. For example 
>> userqueues use it as preemption signaling, but most drivers simply try to 
>> avoid waking up the system with IRQs.
>>
>>> I have a bunch of questions regarding that:
>>>
>>>    1. What does the party polling the fence typically look like? I bet
>>>       it's not userspace, is it? Userspace I'd expect to use poll() on
>>>       a FD, thus an underlying driver has to check the fence somehow.
>>
>> No no, that is indeed userspace.
> 
> Userspace has no direct access to a fence. It's, ultimately a kernel
> ioctl through which userspace can check a fence. That's what I meant:
> it's kernel code implemented in the driver [but running in the user's
> process context]

What I meant is that the polling is triggered by userspace.

In other words the kernel doesn't care if fences signal or not as long as it 
doesn't wait for them. E.g. in case of memory pressure etc...

>>> So it circles around the question why ops.is_signaled is supposedly
>>> unavoidable.
>>
>> Additional to the interrupt/poll handling it is also a really important 
>> optimization for multicore systems, e.g. it makes the signaling state 
>> visible to other CPU cores even when the core handling the IRQ is still busy.
> 
> What is the "signaling state"?
> 
> A fence's signaled status is indicated through an atomic flag which
> becomes visible globally once someone, like said interrupt, has
> signaled the fence.

Well not quite. The atomic flag is the generalized coherent signaled state of 
the dma_fence framework over all CPU cores.

But the state inside the dma_fence implementation which you can query directly 
with the is_signaled callback can be quite different.

For example we have seen the following:
1. MMIO HW registers
2. Local device memory
3. System memory coherent to all CPU cores
4. System memory coherent to only some CPU cores
5. System memory not coherent at all and you need explicit memory barriers

So it is perfectly possible that the is_signaled callback returns true on one 
CPU core and false on another.

And for some use cases you don't want to wait for the CPU to CPU 
synchronization of the atomic flag, but rather go ahead and push the depending 
work etc...

Regards,
Christian.

> 
> 
> P.
> 
>>
>> That is also really important for some use cases as far as I know. Keep in 
>> mind that this framework drivers everything from Android mobiles all the way 
>> up to supercomputers.
>>
>> I mean what we could potentially do is to fix the locking invariant of the 
>> is_signaled callback, but that is probably the only simplification possible 
>> without breaking tons of use cases.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards
>>> P.
>>
> 

Reply via email to