Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

Li, Aubrey Thu, 23 Jul 2020 19:05:26 -0700

On 2020/7/24 9:26, benbjiang(蒋彪) wrote:
> Hi,
> 
>> On Jul 24, 2020, at 7:43 AM, Aubrey Li <[email protected]> wrote:
>>
>> On Thu, Jul 23, 2020 at 4:28 PM benbjiang(蒋彪) <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>>> On Jul 23, 2020, at 4:06 PM, Li, Aubrey <[email protected]> wrote:
>>>>
>>>> On 2020/7/23 15:47, benbjiang(蒋彪) wrote:
>>>>> Hi,
>>>>>
>>>>>> On Jul 23, 2020, at 1:39 PM, Li, Aubrey <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>> On 2020/7/23 12:23, benbjiang(蒋彪) wrote:
>>>>>>> Hi,
>>>>>>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey 
>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>> Hi, Aubrey,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From: Aubrey Li <[email protected]>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Don't migrate if there is a cookie mismatch
>>>>>>>>>>>>>> Load balance tries to move task from busiest CPU to the
>>>>>>>>>>>>>> destination CPU. When core scheduling is enabled, if the
>>>>>>>>>>>>>> task's cookie does not match with the destination CPU's
>>>>>>>>>>>>>> core cookie, this task will be skipped by this CPU. This
>>>>>>>>>>>>>> mitigates the forced idle time on the destination CPU.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Select cookie matched idle CPU
>>>>>>>>>>>>>> In the fast path of task wakeup, select the first cookie matched
>>>>>>>>>>>>>> idle CPU instead of the first idle CPU.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Find cookie matched idlest CPU
>>>>>>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core
>>>>>>>>>>>>>> cookie matches with task's cookie
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Don't migrate task if cookie not match
>>>>>>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose
>>>>>>>>>>>>>> core cookie does not match with task's cookie
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Aubrey Li <[email protected]>
>>>>>>>>>>>>>> Signed-off-by: Tim Chen <[email protected]>
>>>>>>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai <[email protected]>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> kernel/sched/fair.c  | 64 
>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++----
>>>>>>>>>>>>>> kernel/sched/sched.h | 29 ++++++++++++++++++++
>>>>>>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>>>>>>>>>>> index d16939766361..33dc4bf01817 100644
>>>>>>>>>>>>>> --- a/kernel/sched/fair.c
>>>>>>>>>>>>>> +++ b/kernel/sched/fair.c
>>>>>>>>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct 
>>>>>>>>>>>>>> task_numa_env *env,
>>>>>>>>>>>>>>            if (!cpumask_test_cpu(cpu, env->p->cpus_ptr))
>>>>>>>>>>>>>>                    continue;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>>>>>>>>> +           /*
>>>>>>>>>>>>>> +            * Skip this cpu if source task's cookie does not 
>>>>>>>>>>>>>> match
>>>>>>>>>>>>>> +            * with CPU's core cookie.
>>>>>>>>>>>>>> +            */
>>>>>>>>>>>>>> +           if (!sched_core_cookie_match(cpu_rq(cpu), env->p))
>>>>>>>>>>>>>> +                   continue;
>>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>            env->dst_cpu = cpu;
>>>>>>>>>>>>>>            if (task_numa_compare(env, taskimp, groupimp, 
>>>>>>>>>>>>>> maymove))
>>>>>>>>>>>>>>                    break;
>>>>>>>>>>>>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group 
>>>>>>>>>>>>>> *group, struct task_struct *p, int this
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    /* Traverse only the allowed CPUs */
>>>>>>>>>>>>>>    for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
>>>>>>>>>>>>>> +           struct rq *rq = cpu_rq(i);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>>>>>>>>> +           if (!sched_core_cookie_match(rq, p))
>>>>>>>>>>>>>> +                   continue;
>>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>            if (sched_idle_cpu(i))
>>>>>>>>>>>>>>                    return i;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>            if (available_idle_cpu(i)) {
>>>>>>>>>>>>>> -                   struct rq *rq = cpu_rq(i);
>>>>>>>>>>>>>>                    struct cpuidle_state *idle = 
>>>>>>>>>>>>>> idle_get_state(rq);
>>>>>>>>>>>>>>                    if (idle && idle->exit_latency < 
>>>>>>>>>>>>>> min_exit_latency) {
>>>>>>>>>>>>>>                            /*
>>>>>>>>>>>>>> @@ -6224,8 +6239,18 @@ static int select_idle_cpu(struct 
>>>>>>>>>>>>>> task_struct *p, struct sched_domain *sd, int t
>>>>>>>>>>>>>>    for_each_cpu_wrap(cpu, cpus, target) {
>>>>>>>>>>>>>>            if (!--nr)
>>>>>>>>>>>>>>                    return -1;
>>>>>>>>>>>>>> -           if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
>>>>>>>>>>>>>> -                   break;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +           if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) {
>>>>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>>>>>>>>> +                   /*
>>>>>>>>>>>>>> +                    * If Core Scheduling is enabled, select 
>>>>>>>>>>>>>> this cpu
>>>>>>>>>>>>>> +                    * only if the process cookie matches core 
>>>>>>>>>>>>>> cookie.
>>>>>>>>>>>>>> +                    */
>>>>>>>>>>>>>> +                   if (sched_core_enabled(cpu_rq(cpu)) &&
>>>>>>>>>>>>>> +                       p->core_cookie == 
>>>>>>>>>>>>>> cpu_rq(cpu)->core->core_cookie)
>>>>>>>>>>>>> Why not also add similar logic in select_idle_smt to reduce 
>>>>>>>>>>>>> forced-idle? :)
>>>>>>>>>>>> We hit select_idle_smt after we scaned the entire LLC domain for 
>>>>>>>>>>>> idle cores
>>>>>>>>>>>> and idle cpus and failed,so IMHO, an idle smt is probably a good 
>>>>>>>>>>>> choice under
>>>>>>>>>>>> this scenario.
>>>>>>>>>>>
>>>>>>>>>>> AFAIC, selecting idle sibling with unmatched cookie will cause 
>>>>>>>>>>> unnecessary fored-idle, unfairness and latency, compared to 
>>>>>>>>>>> choosing *target* cpu.
>>>>>>>>>> Choosing target cpu could increase the runnable task number on the 
>>>>>>>>>> target runqueue, this
>>>>>>>>>> could trigger busiest->nr_running > 1 logic and makes the idle 
>>>>>>>>>> sibling trying to pull but
>>>>>>>>>> not success(due to cookie not match). Putting task to the idle 
>>>>>>>>>> sibling is relatively stable IMHO.
>>>>>>>>>
>>>>>>>>> I’m afraid that *unsuccessful* pullings between smts would not result 
>>>>>>>>> in unstableness, because
>>>>>>>>> the load-balance always do periodicly , and unsuccess means nothing 
>>>>>>>>> happen.
>>>>>>>> unsuccess pulling means more unnecessary overhead in load balance.
>>>>>>>>
>>>>>>>>> On the contrary, unmatched sibling tasks running concurrently could 
>>>>>>>>> bring forced-idle to each other repeatedly,
>>>>>>>>> Which is more unstable, and more costly when pick_next_task for all 
>>>>>>>>> siblings.
>>>>>>>> Not worse than two tasks ping-pong on the same target run queue I 
>>>>>>>> guess, and better if
>>>>>>>> - task1(cookie A) is running on the target, and task2(cookie B) in the 
>>>>>>>> runqueue,
>>>>>>>> - task3(cookie B) coming
>>>>>>>>
>>>>>>>> If task3 chooses target's sibling, it could have a chance to run 
>>>>>>>> concurrently with task2.
>>>>>>>> But if task3 chooses target, it will wait for next pulling luck of 
>>>>>>>> load balancer
>>>>>>> That’s more interesting. :)
>>>>>>> Distributing different cookie tasks onto different cpus(or cpusets) 
>>>>>>> could be the *ideal stable status* we want, as I understood.
>>>>>>> Different cookie tasks running on sibling smts could hurt performance, 
>>>>>>> and that should be avoided with best effort.
>>>>>> We already tried to avoid when we scan idle cores and idle cpus in llc 
>>>>>> domain.
>>>>>
>>>>> I’m afraid that’s not enough either, :)
>>>>> 1. Scanning Idle cpus is not a full scan, there is limit according to 
>>>>> scan cost.
>>>>> 2. That's only trying at the *core/cpu* level, *SMT* level should be 
>>>>> considered too.
>>>>>
>>>>>>
>>>>>>> For above case, selecting idle sibling cpu can improve the concurrency 
>>>>>>> indeed, but it decrease the imbalance for load-balancer.
>>>>>>> In that case, load-balancer could not notice the imbalance, and would 
>>>>>>> do nothing to improve the unmatched situation.
>>>>>>> On the contrary, choosing the *target* cpu could enhance the imbalance, 
>>>>>>> and load-balancer could try to pull unmatched task away,
>>>>>> Pulling away to where needs another bunch of elaboration.
>>>>>
>>>>> Still with the SMT2+3tasks case,
>>>>> if *idle sibling* chosen,
>>>>> Smt1’s load = task1+task2, smt2’s load = task3. Task3 will run 
>>>>> intermittently because of forced-idle,
>>>>> so smt2’s real load could low enough, that it could not be pulled away 
>>>>> forever. That’s indeed a stable state,
>>>>> but with performance at a discount.
>>>>>
>>>>> If *target sibling* chose,
>>>>> Smt1’s load = task1+task2+task3, smt2’s load=0. It’s a obvious imbalance, 
>>>>> and load-balancer will pick a task to pull,
>>>>> 1. If task1(cookie A) picked, that’s done for good.
>>>>> 2. If task2(cookie B) or task3(cookie B) picked, that’s ok too, the rest 
>>>>> task(cookie B) could be pulled away at next balance(maybe need to improve 
>>>>> the pulling to tend to pull matched task more aggressively).
>>>>> And then, we may reach a more stable state *globally* without performance 
>>>>> discount.
>>>>
>>>> I'm not sure what you mean pulled away,
>>> I mean pulled away by other cpus, may be triggered by idle balance or 
>>> periodic balance on other cpus.
>>>
>>>> - if you mean pulled away from this core, cookieA in idle sibling case can 
>>>> be
>>>> pulled away too.
>>> Yep, cookieA(task1) in idle sibling case could be pulled away, but
>>> cookieB(task3) on the smt2 could never get the chance being pulled
>>> away(unless being waken up).
>>> If cookieA(task1) failed being pulled(cookieB(task2) on smt1 may be pulled,
>>> 50% chance), cookieA(task1) and cookieB(task3) would reach the stable state
>>> with performance discount.
>>>
>> If you meant pulled away from this core, I didn't see how two cases are
>> different either. For example, when task2(cookieB) runs on SMT1, task3
>> cookieb can be pulled to SMT2. and when task1(cookieA) switch onto SMT1,
>> task2(cookieB) can be pulled away by other cpus, too.
> That’s the case only if SMT2’s pulling happens when task2(cookieB) is running
> on SMT1, which depends on,
> 1. Smt2 not entering tickless or nohz_balancer_kick picks smt2 before other
> cpu’s pulling, may be unlikely. :)
> 2. Task1(cookieA) is not running on SMT1.
> otherwise it would be the case I described above.  
> 
> Besides, for other cases, like smt2+2task(CookieA+CookieB), picking *target*
> cpu instead of *idle sibling* could be more helpful to reach the global stable
> status(distribute different cookies onto different cores). 
>If the task number of two cookies has a big difference, then distributing
different cookies onto different cores leads to a big imbalance, that state may
be stable but not an optimal state, I guess that's why core run queue does
not refuse different cookies onto its rb tree.


I think I understand your concern but IMHO I'm not convinced adding cookie match
in idle SMT selection is a best choice, if you have some performance data of 
your
workload, that will be very helpful to understand the case.

If distributing different cookies onto different cores is a hard requirement 
from
your side, you are welcome to submit a patch to see others opinion.

Thanks,
-Aubrey

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

Reply via email to