On 3/16/26 22:26, Christian Loehle wrote: > On 3/16/26 17:46, Tejun Heo wrote: >> Hello, >> >> On Mon, Mar 16, 2026 at 10:02:48AM +0000, Christian Loehle wrote: >>> @@ -5686,11 +5718,20 @@ static void kick_cpus_irq_workfn(struct irq_work >>> *irq_work) >>> * task is picked subsequently. The latter is necessary to break >>> * the wait when $cpu is taken by a higher sched class. >>> */ >>> - if (cpu != cpu_of(this_rq)) >>> + if (cpu != this_cpu) >>> smp_cond_load_acquire(wait_kick_sync, VAL != >>> ksyncs[cpu]); >> >> Given that irq_work is executed at the end of IRQ handling, we can just >> reschedule the irq work when the condition is not met (or separate that out >> into its own irq_work). That way, I think we can avoid the global lock. >> > I'll go poke at it some more, but I think it's not guaranteed that B actually > advances kick_sync if A keeps kicking. At least not if the handling is in > HARD irqwork? > Or what would the separated out irq work do differently?
So in my particular example I do the SCX_KICK_WAIT in ops.enqueue(), which is fair, but I don't think we can delay calling that until we've advanced our local kick_sync and if we don't we end up in the deadlock, even if e.g. we separate out the retry (and make that lazy), because then the local CPU is able to continuously issue new kicks (which will have to be handled by the non-retry path) without advancing it's own kick_sync. The closest thing to that I can get working is separating out the SCX_KICK_WAIT entirely and make that lazy. In practice though that would realistically make the SCX_KICK_WAIT latency most likely a lot higher than with the global lock, is that what you had in mind? Or am I missing something here?

