On 16/01/21 12:30, Peter Zijlstra wrote: > @@ -1796,13 +1796,28 @@ static inline bool rq_has_pinned_tasks(s > */ > static inline bool is_cpu_allowed(struct task_struct *p, int cpu) > { > + /* When not in the task's cpumask, no point in looking further. */ > if (!cpumask_test_cpu(cpu, p->cpus_ptr)) > return false; > > - if (is_per_cpu_kthread(p) || is_migration_disabled(p)) > + /* migrate_disabled() must be allowed to finish. */ > + if (is_migration_disabled(p)) > return cpu_online(cpu); > > - return cpu_active(cpu); > + /* Non kernel threads are not allowed during either online or offline. > */ > + if (!(p->flags & PF_KTHREAD)) > + return cpu_active(cpu); > + > + /* KTHREAD_IS_PER_CPU is always allowed. */ > + if (kthread_is_per_cpu(p)) > + return cpu_online(cpu); > + > + /* Regular kernel threads don't get to stay during offline. */ > + if (cpu_rq(cpu)->balance_callback == &balance_push_callback) > + return cpu_active(cpu);
is_cpu_allowed(, cpu) isn't guaranteed to have cpu_rq(cpu)'s rq_lock held, so this can race with balance_push_set(, true). This shouldn't matter under normal circumstances as we'll have sched_cpu_wait_empty() further down the line. This might get ugly with the rollback faff - this is jumping the gun a bit, but that's something we'll have to address, and I think what I'm concerned about is close to what you mentioned in http://lore.kernel.org/r/yam1t2qzr7rib...@hirez.programming.kicks-ass.net Here's what I'm thinking of: _cpu_up() ttwu() select_task_rq() is_cpu_allowed() rq->balance_callback != balance_push_callback smpboot_unpark_threads() // FAIL (now going down, set push here) sched_cpu_wait_empty() ... ttwu_queue() sched_cpu_dying() *ARGH* I've written some horrors on top of this series here: https://gitlab.arm.com/linux-arm/linux-vs/-/commits/mainline/migrate_disable/stragglers/ Also, my TX2 is again in need of CPR, so in the meantime I'm running tests on a (much) smaller machine... > + > + /* But are allowed during online. */ > + return cpu_online(cpu); > }