On 2026/1/10 4:15, Waiman Long wrote:
> On 1/5/26 1:29 AM, Chen Ridong wrote:
>>
>> On 2026/1/5 12:06, Waiman Long wrote:
>>> On 1/4/26 10:58 PM, Chen Ridong wrote:
>>>> On 2026/1/5 11:50, Waiman Long wrote:
>>>>> On 1/4/26 8:15 PM, Chen Ridong wrote:
>>>>>> On 2026/1/5 5:25, Waiman Long wrote:
>>>>>>> On 1/3/26 9:48 PM, Chen Ridong wrote:
>>>>>>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>>>>>>> Since commit f62a5d39368e ("cgroup/cpuset: Remove 
>>>>>>>>> remote_partition_check()
>>>>>>>>> & make update_cpumasks_hier() handle remote partition"), the
>>>>>>>>> compute_effective_exclusive_cpumask() helper was extended to
>>>>>>>>> strip exclusive CPUs from siblings when computing effective_xcpus
>>>>>>>>> (cpuset.cpus.exclusive.effective). This helper was later renamed to
>>>>>>>>> compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
>>>>>>>>> CPU mask computation logic").
>>>>>>>>>
>>>>>>>>> This helper is supposed to be used consistently to compute
>>>>>>>>> effective_xcpus. However, there is an exception within the callback
>>>>>>>>> critical section in update_cpumasks_hier() when exclusive_cpus of a
>>>>>>>>> valid partition root is empty. This can cause effective_xcpus value to
>>>>>>>>> differ depending on where exactly it is last computed. Fix this by 
>>>>>>>>> using
>>>>>>>>> compute_excpus() in this case to give a consistent result.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Waiman Long <[email protected]>
>>>>>>>>> ---
>>>>>>>>>      kernel/cgroup/cpuset.c | 14 +++++---------
>>>>>>>>>      1 file changed, 5 insertions(+), 9 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>>>>>>> index da2b3b51630e..37d118a9ad4d 100644
>>>>>>>>> --- a/kernel/cgroup/cpuset.c
>>>>>>>>> +++ b/kernel/cgroup/cpuset.c
>>>>>>>>> @@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct 
>>>>>>>>> cpuset *cs, struct tmpmasks
>>>>>>>>> *tmp,
>>>>>>>>>              spin_lock_irq(&callback_lock);
>>>>>>>>>              cpumask_copy(cp->effective_cpus, tmp->new_cpus);
>>>>>>>>>              cp->partition_root_state = new_prs;
>>>>>>>>> -        if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
>>>>>>>>> -            compute_excpus(cp, cp->effective_xcpus);
>>>>>>>>> -
>>>>>>>>>              /*
>>>>>>>>> -         * Make sure effective_xcpus is properly set for a valid
>>>>>>>>> -         * partition root.
>>>>>>>>> +         * Need to compute effective_xcpus if either exclusive_cpus
>>>>>>>>> +         * is non-empty or it is a valid partition root.
>>>>>>>>>               */
>>>>>>>>> -        if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
>>>>>>>>> -            cpumask_and(cp->effective_xcpus,
>>>>>>>>> -                    cp->cpus_allowed, parent->effective_xcpus);
>>>>>>>>> -        else if (new_prs < 0)
>>>>>>>>> +        if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
>>>>>>>>> +            compute_excpus(cp, cp->effective_xcpus);
>>>>>>>>> +        if (new_prs < 0)
>>>>>>>>>                  reset_partition_data(cp);
>>>>>>>>>              spin_unlock_irq(&callback_lock);
>>>>>>>>>      
>>>>>>>> The code resets partition data only for new_prs < 0. My understanding 
>>>>>>>> is that a partition is
>>>>>>>> invalid
>>>>>>>> when new_prs <= 0. Shouldn't reset_partition_data() also be called 
>>>>>>>> when new_prs = 0? Is there a
>>>>>>>> specific reason to skip the reset in that case?
>>>>>>> update_cpumasks_hier() is called when changes in a cpuset or hotplug 
>>>>>>> affects other cpusets in
>>>>>>> the
>>>>>>> hierarchy. With respect to changes in partition state, it is either 
>>>>>>> from valid to invalid or
>>>>>>> vice
>>>>>>> versa. It will not change from a valid partition to member. The only 
>>>>>>> way new_prs = 0 is when
>>>>>>> old_prs
>>>>>>> = 0. Even if the affected cpuset is processed again in 
>>>>>>> update_cpumask_hier(), any state change
>>>>>>> from
>>>>>>> valid partition to member (update_prstate()), reset_partition_data() 
>>>>>>> should have been called
>>>>>>> there.
>>>>>>> That is why we only care about when new_prs != 0.
>>>>>>>
>>>>>> Thank you for your patience.
>>>>>>
>>>>>>> The code isn't wrong here. However I can change the condition to 
>>>>>>> (new_prs <= 0) if it makes it
>>>>>>> easier to understand.
>>>>>>>
>>>>>> I agree there's nothing wrong with the current logic. However, for 
>>>>>> clarity, I suggest changing
>>>>>> the
>>>>>> condition to (new_prs <= 0). This allows the function's logic to be 
>>>>>> fully self-consistent and
>>>>>> focused on a single responsibility. This approach would allow us to 
>>>>>> simplify the code to:
>>>>>>
>>>>>>       if (new_prs > 0)
>>>>>>           compute_excpus(cp, cp->effective_xcpus);
>>>>>>       else
>>>>>>           reset_partition_data(cp);
>>>>>>
>>>>>> Since reset_partition_data() already handles cases whether 
>>>>>> cp->exclusive_cpus is empty or not,
>>>>>> this
>>>>>> implementation would be more concise while correctly covering all 
>>>>>> scenarios.
>>>>> effective_xcpus should be set when exclusive_cpus is not empty or when 
>>>>> the cpuset is a valid
>>>>> partition root. So just checking new_prs for compute_excpus() is not 
>>>>> enough.
>>>>>
>>>> If we change the condition to (new_prs <= 0), it will reset the partition 
>>>> data even when we call
>>>> compute_excpus (for !cpumask_empty(cp->exclusive_cpus)), so we should 
>>>> still get the same result,
>>>> right?
>>> Changing the condition to (new_prs <= 0) won't affect the result except for 
>>> a bit of wasted cpu
>>> cycles. That is why I am planning to make the change in the next version to 
>>> make it easier to
>>> understand.
>>>
>> Sorry, I should have been clearer. If we change the condition, the code 
>> would essentially be:
>>
>>     if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
>>         compute_excpus(cp, cp->effective_xcpus);
>>          if (new_prs <= 0)
>>         reset_partition_data(cp);
>>
>> For cases where new_prs <= 0 && !cpumask_empty(cp->exclusive_cpus), both 
>> compute_excpus() and
>> reset_partition_data() would be called.
>>
>> Is this functionally equivalent to:
>>
>>     if (new_prs > 0)
>>         compute_excpus(cp, cp->effective_xcpus);
>>          else (new_prs <= 0)
>>         reset_partition_data(cp);
> 
> They are not equivalent because reset_partition_data() won't do a 
> compute_excpus(). In fact, one of
> the tests in test_cpuset_prs.sh will fail if we make this change.
> 

Understood. If exclusive_cpus is non-empty, the effective_exclusive_cpus of a 
cpuset may remain
non-empty even when the cpuset itself is not a valid root. reset_partition_data 
will only reset this
effective list if the exclusive_cpus set is empty.

Thank you very much.

-- 
Best regards,
Ridong


Reply via email to