On 16/01/21 12:30, Peter Zijlstra wrote:
> @@ -1796,13 +1796,28 @@ static inline bool rq_has_pinned_tasks(s
>   */
>  static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
>  {
> +     /* When not in the task's cpumask, no point in looking further. */
>       if (!cpumask_test_cpu(cpu, p->cpus_ptr))
>               return false;
>
> -     if (is_per_cpu_kthread(p) || is_migration_disabled(p))
> +     /* migrate_disabled() must be allowed to finish. */
> +     if (is_migration_disabled(p))
>               return cpu_online(cpu);
>
> -     return cpu_active(cpu);
> +     /* Non kernel threads are not allowed during either online or offline. 
> */
> +     if (!(p->flags & PF_KTHREAD))
> +             return cpu_active(cpu);
> +
> +     /* KTHREAD_IS_PER_CPU is always allowed. */
> +     if (kthread_is_per_cpu(p))
> +             return cpu_online(cpu);
> +
> +     /* Regular kernel threads don't get to stay during offline. */
> +     if (cpu_rq(cpu)->balance_callback == &balance_push_callback)
> +             return cpu_active(cpu);

is_cpu_allowed(, cpu) isn't guaranteed to have cpu_rq(cpu)'s rq_lock
held, so this can race with balance_push_set(, true). This shouldn't
matter under normal circumstances as we'll have sched_cpu_wait_empty()
further down the line.

This might get ugly with the rollback faff - this is jumping the gun a
bit, but that's something we'll have to address, and I think what I'm
concerned about is close to what you mentioned in

  http://lore.kernel.org/r/yam1t2qzr7rib...@hirez.programming.kicks-ass.net

Here's what I'm thinking of:

_cpu_up()                            ttwu()
                                       select_task_rq()
                                         is_cpu_allowed()
                                           rq->balance_callback != 
balance_push_callback
  smpboot_unpark_threads() // FAIL
  (now going down, set push here)
  sched_cpu_wait_empty()
  ...                                  ttwu_queue()
  sched_cpu_dying()
  *ARGH*

I've written some horrors on top of this series here:

  
https://gitlab.arm.com/linux-arm/linux-vs/-/commits/mainline/migrate_disable/stragglers/

Also, my TX2 is again in need of CPR, so in the meantime I'm running
tests on a (much) smaller machine...

> +
> +     /* But are allowed during online. */
> +     return cpu_online(cpu);
>  }

Reply via email to