On Mon, May 18, 2020 at 10:57:58AM +0200, Peter Zijlstra wrote: > On Fri, May 15, 2020 at 02:34:29AM +0200, Frederic Weisbecker wrote: > > So far setting a tick dependency on any task, including current, used to > > trigger an IPI to all CPUs. That's of course suboptimal but it wasn't > > an issue as long as it was only used by posix-cpu-timers on nohz_full, > > a combo that nobody seemed to use in real life. > > > > But RCU started to use task tick dependency on current task to fix > > stall issues on callbacks processing. These trigger regular and > > undesired system wide IPIs on nohz_full. > > > > The fix is very easy while setting a tick dependency on the current > > task, only its CPU needs an IPI. > > > > Fixes: 6a949b7af82d (rcu: Force on tick when invoking lots of callbacks) > > Reported-by: Matt Fleming <m...@codeblueprint.co.uk> > > Signed-off-by: Frederic Weisbecker <frede...@kernel.org> > > Cc: sta...@kernel.org > > Cc: Paul E. McKenney <paul...@kernel.org> > > Cc: Thomas Gleixner <t...@linutronix.de> > > Cc: Peter Zijlstra <pet...@infradead.org> > > Cc: Ingo Molnar <mi...@kernel.org> > > --- > > kernel/time/tick-sched.c | 22 +++++++++++++++------- > > 1 file changed, 15 insertions(+), 7 deletions(-) > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index 3e2dc9b8858c..f0199a4ba1ad 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -351,16 +351,24 @@ void tick_nohz_dep_clear_cpu(int cpu, enum > > tick_dep_bits bit) > > EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu); > > > > /* > > - * Set a per-task tick dependency. Posix CPU timers need this in order to > > elapse > > - * per task timers. > > + * Set a per-task tick dependency. RCU need this. Also posix CPU timers > > + * in order to elapse per task timers. > > */ > > void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits > > bit) > > { > > - /* > > - * We could optimize this with just kicking the target running the task > > - * if that noise matters for nohz full users. > > - */ > > - tick_nohz_dep_set_all(&tsk->tick_dep_mask, bit); > > + if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) { > > So why not simply: > > tick_nohz_full_kick_cpu(task_cpu(tsk)); ? > > If it got preempted, the scheduling involved would already have observed > the bit we just set and kept the tick on anyway, same for migration. > > Or am I missing something?
Hmm, I guess we would then need some sort of ordering like this: CPU 0 CPU 1 ----- ----- p->cpu = smp_processor_id() atomic_fetch_or(bit, p->tick_dep_mask) smp_mb(); smp_mb(); //actually implied by atomic_fetch_or() READ p->tick_dep_mask irq_work_on(...., p->cpu) And since p->cpu is already set and visible on context switch, it should work indeed. Now in the case CPU 1 reads a stale task_cpu(), that's fine as CPU 0 sees the new tick_dep_mask, but CPU 1 might be dealing with an offlined CPU, right? So I guess I should still protect against hotplug with cpus_read_lock() but tick_nohz_dep_set_task() isn't supposed to sleep... Or preempt_disable() could help us with that somehow? I'm always confused with the guarantees that disabled preemption can offer toward hotplug. > > > + if (tsk == current) { > > + preempt_disable(); > > + tick_nohz_full_kick(); > > + preempt_enable(); > > + } else { > > + /* > > + * Some future tick_nohz_full_kick_task() > > + * should optimize this. > > + */ > > + tick_nohz_full_kick_all(); > > + } > > + } >