Hi Philip,

I only found this message by coincident, please make sure to always CC my AMD 
work email address as well.

On 2/19/26 12:06, Philipp Stanner wrote:
> Yo Christian,
> 
> I'd like to discuss the dma_fence fast path optimization
> (ops.is_signaled) again.
> 
> As far as I understand by now, the use case is that some drivers will
> never signal fences; but the consumer of the fence actively polls
> whether a fence is signaled or not.
> 
> Right?

Close but not 100% right. The semantic is that enabled_signaling is only called 
when somebody actively waits for the dma_fence to finish.

So as long as both userspace and kernel only poll for the fence status 
enable_signaling is never called and only is_signaled is called.

What drivers/fence implementations do with that is up to them. For example 
userqueues use it as preemption signaling, but most drivers simply try to avoid 
waking up the system with IRQs.

> I have a bunch of questions regarding that:
> 
>    1. What does the party polling the fence typically look like? I bet
>       it's not userspace, is it? Userspace I'd expect to use poll() on
>       a FD, thus an underlying driver has to check the fence somehow.

No no, that is indeed userspace.

As soon as the kernel starts to call dma_fence_wait() (for example) we have the 
normal guaranteed to signal semantics we always have.

>    2. What if that party checks the fence, determines it is unsignaled?
>       Will it then again try later?

I have no idea, that depends on how the userspace component is implemented.

>    3. If it tries again later anyways, then what is the problem with
>       the fence-issuing driver itself checking every 5, 10 or 50
>       milliseconds what the counter in the GPU ring buffer is, and then
>       signals all those fences?

That you need to wake up for that, this costs quite a lot of power.

See two different approach:

1. Interrupt driven, e.g. somebody says signal me as soon as possible when the 
work is done.

2. Poll driven, e.g. userspace wakes up every N milliseconds anyway and it 
doesn't matter if the status changes a bit later.

> So it circles around the question why ops.is_signaled is supposedly
> unavoidable.

Additional to the interrupt/poll handling it is also a really important 
optimization for multicore systems, e.g. it makes the signaling state visible 
to other CPU cores even when the core handling the IRQ is still busy.

That is also really important for some use cases as far as I know. Keep in mind 
that this framework drivers everything from Android mobiles all the way up to 
supercomputers.

I mean what we could potentially do is to fix the locking invariant of the 
is_signaled callback, but that is probably the only simplification possible 
without breaking tons of use cases.

Regards,
Christian.

> 
> Regards
> P.

Reply via email to