On Wed, May 22, 2019 at 04:14:23PM +0100, Andrew Murray wrote: > On Wed, May 22, 2019 at 04:49:18PM +0200, Peter Zijlstra wrote: > > On Wed, May 22, 2019 at 03:37:11PM +0100, Andrew Murray wrote: > > > > Is perhaps the problem that on_each_cpu_cond() uses cpu_onlne_mask > > > > without protection? > > > > > > Does this prevent racing with a CPU going offline? I guess this prevents > > > the warning at the expense of a lock - but is only beneficial in the > > > unlikely path. (In the likely path this prevents new CPUs going offline > > > but we don't care because we don't WARN if they aren't they when we > > > attempt to call functions). > > > > > > At least this is my limited understanding. > > > > Hmm.. I don't think it could matter, we only use the mask when > > preempt_disable(), which would already block offline, due to it using > > stop_machine(). > > OK, so as long as all arches use stop_machine to bring down a CPU then > preeempt_disable will stop racing with CPUs going offline (I don't know > if they all do or not).
kernel/cpu.c:takedown_cpu() is generic code :-) > > So the patch is a no-op. > > > > What's the WARN you see? TLB invalidation should pass mm_cpumask(), > > which similarly should not contain offline CPUs I'm thinking. > > So the only remaining issue is if callers pass offline CPUs to the > function (I guess there is still the chance of a race between calling > on_each_cpu_cond_mask and entering the preempt disabled'd bit). As you > suggest they probably don't. Yes, it is possible to construct fail that way.

