Hi Philip, I only found this message by coincident, please make sure to always CC my AMD work email address as well.
On 2/19/26 12:06, Philipp Stanner wrote: > Yo Christian, > > I'd like to discuss the dma_fence fast path optimization > (ops.is_signaled) again. > > As far as I understand by now, the use case is that some drivers will > never signal fences; but the consumer of the fence actively polls > whether a fence is signaled or not. > > Right? Close but not 100% right. The semantic is that enabled_signaling is only called when somebody actively waits for the dma_fence to finish. So as long as both userspace and kernel only poll for the fence status enable_signaling is never called and only is_signaled is called. What drivers/fence implementations do with that is up to them. For example userqueues use it as preemption signaling, but most drivers simply try to avoid waking up the system with IRQs. > I have a bunch of questions regarding that: > > 1. What does the party polling the fence typically look like? I bet > it's not userspace, is it? Userspace I'd expect to use poll() on > a FD, thus an underlying driver has to check the fence somehow. No no, that is indeed userspace. As soon as the kernel starts to call dma_fence_wait() (for example) we have the normal guaranteed to signal semantics we always have. > 2. What if that party checks the fence, determines it is unsignaled? > Will it then again try later? I have no idea, that depends on how the userspace component is implemented. > 3. If it tries again later anyways, then what is the problem with > the fence-issuing driver itself checking every 5, 10 or 50 > milliseconds what the counter in the GPU ring buffer is, and then > signals all those fences? That you need to wake up for that, this costs quite a lot of power. See two different approach: 1. Interrupt driven, e.g. somebody says signal me as soon as possible when the work is done. 2. Poll driven, e.g. userspace wakes up every N milliseconds anyway and it doesn't matter if the status changes a bit later. > So it circles around the question why ops.is_signaled is supposedly > unavoidable. Additional to the interrupt/poll handling it is also a really important optimization for multicore systems, e.g. it makes the signaling state visible to other CPU cores even when the core handling the IRQ is still busy. That is also really important for some use cases as far as I know. Keep in mind that this framework drivers everything from Android mobiles all the way up to supercomputers. I mean what we could potentially do is to fix the locking invariant of the is_signaled callback, but that is probably the only simplification possible without breaking tons of use cases. Regards, Christian. > > Regards > P.
