On 09/07/2017 11:21 PM, Peter Zijlstra wrote:
> On Thu, Sep 07, 2017 at 07:26:23PM +0530, Prateek Sood wrote:
>> Remove circular dependency deadlock in a scenario where hotplug of CPU is
>> being done while there is updation in cgroup and cpuset triggered from
>> userspace.
> 
> You've forgotten to mention your solution to the deadlock, namely
> inverting cpuset_mutex and cpu_hotplug_lock.
> 
>> Signed-off-by: Prateek Sood <[email protected]>
>> ---
>>  kernel/cgroup/cpuset.c | 32 +++++++++++++++++++-------------
>>  1 file changed, 19 insertions(+), 13 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 2f4039b..60dc0ac 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -816,16 +816,15 @@ static int generate_sched_domains(cpumask_var_t 
>> **domains,
>>   * 'cpus' is removed, then call this routine to rebuild the
>>   * scheduler's dynamic sched domains.
>>   *
>> - * Call with cpuset_mutex held.  Takes get_online_cpus().
>>   */
>> -static void rebuild_sched_domains_locked(void)
>> +static void rebuild_sched_domains_cpuslocked(void)
>>  {
>>      struct sched_domain_attr *attr;
>>      cpumask_var_t *doms;
>>      int ndoms;
>>  
>> +    lockdep_assert_cpus_held();
>>      lockdep_assert_held(&cpuset_mutex);
>> -    get_online_cpus();
>>  
>>      /*
>>       * We have raced with CPU hotplug. Don't do anything to avoid
>> @@ -833,27 +832,27 @@ static void rebuild_sched_domains_locked(void)
>>       * Anyways, hotplug work item will rebuild sched domains.
>>       */
>>      if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>> -            goto out;
>> +            return;
>>  
>>      /* Generate domain masks and attrs */
>>      ndoms = generate_sched_domains(&doms, &attr);
>>  
>>      /* Have scheduler rebuild the domains */
>>      partition_sched_domains(ndoms, doms, attr);
>> -out:
>> -    put_online_cpus();
>>  }
>>  #else /* !CONFIG_SMP */
>> -static void rebuild_sched_domains_locked(void)
>> +static void rebuild_sched_domains_cpuslocked(void)
>>  {
>>  }
>>  #endif /* CONFIG_SMP */
>>  
>>  void rebuild_sched_domains(void)
>>  {
>> +    get_online_cpus();
>>      mutex_lock(&cpuset_mutex);
>> -    rebuild_sched_domains_locked();
>> +    rebuild_sched_domains_cpuslocked();
>>      mutex_unlock(&cpuset_mutex);
>> +    put_online_cpus();
>>  }
> 
> But if you invert these locks, the need for cpuset_hotplug_workfn() goes
> away, at least for the CPU part, and we can make in synchronous again.
> Yay!!
> 
> Also, I think new code should use cpus_read_lock() instead of
> get_online_cpus().
> 

Thanks for the review comments Peter.
For patch related to circular deadlock, I will send an updated version.

The callback making a call to cpuset_hotplug_workfn()in hotplug path are
        [CPUHP_AP_ACTIVE] = {
                .name                   = "sched:active",
                .startup.single         = sched_cpu_activate,
                .teardown.single        = sched_cpu_deactivate,
        },

if we make cpuset_hotplug_workfn() synchronous, deadlock might happen:
_cpu_down()
   cpus_write_lock()  //held
      cpuhp_kick_ap_work()
        cpuhp_kick_ap()
           __cpuhp_kick_ap()
              wake_up_process() //cpuhp_thread_fun
                wait_for_ap_thread() //wait for complete from cpuhp_thread_fun()

cpuhp_thread_fun()
   cpuhp_invoke_callback()
     sched_cpu_deactivate()
       cpuset_cpu_inactive()
          cpuset_update_active_cpus()
             cpuset_hotplug_work()
                rebuild_sched_domains()
                   cpus_read_lock() //waiting as acquired in _cpu_down()
                  

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project

Reply via email to