On 2020/7/24 9:26, benbjiang(蒋彪) wrote: > Hi, > >> On Jul 24, 2020, at 7:43 AM, Aubrey Li <[email protected]> wrote: >> >> On Thu, Jul 23, 2020 at 4:28 PM benbjiang(蒋彪) <[email protected]> wrote: >>> >>> Hi, >>> >>>> On Jul 23, 2020, at 4:06 PM, Li, Aubrey <[email protected]> wrote: >>>> >>>> On 2020/7/23 15:47, benbjiang(蒋彪) wrote: >>>>> Hi, >>>>> >>>>>> On Jul 23, 2020, at 1:39 PM, Li, Aubrey <[email protected]> >>>>>> wrote: >>>>>> >>>>>> On 2020/7/23 12:23, benbjiang(蒋彪) wrote: >>>>>>> Hi, >>>>>>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey >>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote: >>>>>>>>>>>>> Hi, Aubrey, >>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai >>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> From: Aubrey Li <[email protected]> >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Don't migrate if there is a cookie mismatch >>>>>>>>>>>>>> Load balance tries to move task from busiest CPU to the >>>>>>>>>>>>>> destination CPU. When core scheduling is enabled, if the >>>>>>>>>>>>>> task's cookie does not match with the destination CPU's >>>>>>>>>>>>>> core cookie, this task will be skipped by this CPU. This >>>>>>>>>>>>>> mitigates the forced idle time on the destination CPU. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Select cookie matched idle CPU >>>>>>>>>>>>>> In the fast path of task wakeup, select the first cookie matched >>>>>>>>>>>>>> idle CPU instead of the first idle CPU. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Find cookie matched idlest CPU >>>>>>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core >>>>>>>>>>>>>> cookie matches with task's cookie >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Don't migrate task if cookie not match >>>>>>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose >>>>>>>>>>>>>> core cookie does not match with task's cookie >>>>>>>>>>>>>> >>>>>>>>>>>>>> Signed-off-by: Aubrey Li <[email protected]> >>>>>>>>>>>>>> Signed-off-by: Tim Chen <[email protected]> >>>>>>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai <[email protected]> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> kernel/sched/fair.c | 64 >>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++---- >>>>>>>>>>>>>> kernel/sched/sched.h | 29 ++++++++++++++++++++ >>>>>>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>>>>>>>>>>>> index d16939766361..33dc4bf01817 100644 >>>>>>>>>>>>>> --- a/kernel/sched/fair.c >>>>>>>>>>>>>> +++ b/kernel/sched/fair.c >>>>>>>>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct >>>>>>>>>>>>>> task_numa_env *env, >>>>>>>>>>>>>> if (!cpumask_test_cpu(cpu, env->p->cpus_ptr)) >>>>>>>>>>>>>> continue; >>>>>>>>>>>>>> >>>>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>>>>>> + /* >>>>>>>>>>>>>> + * Skip this cpu if source task's cookie does not >>>>>>>>>>>>>> match >>>>>>>>>>>>>> + * with CPU's core cookie. >>>>>>>>>>>>>> + */ >>>>>>>>>>>>>> + if (!sched_core_cookie_match(cpu_rq(cpu), env->p)) >>>>>>>>>>>>>> + continue; >>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>> + >>>>>>>>>>>>>> env->dst_cpu = cpu; >>>>>>>>>>>>>> if (task_numa_compare(env, taskimp, groupimp, >>>>>>>>>>>>>> maymove)) >>>>>>>>>>>>>> break; >>>>>>>>>>>>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group >>>>>>>>>>>>>> *group, struct task_struct *p, int this >>>>>>>>>>>>>> >>>>>>>>>>>>>> /* Traverse only the allowed CPUs */ >>>>>>>>>>>>>> for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) { >>>>>>>>>>>>>> + struct rq *rq = cpu_rq(i); >>>>>>>>>>>>>> + >>>>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>>>>>> + if (!sched_core_cookie_match(rq, p)) >>>>>>>>>>>>>> + continue; >>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>> + >>>>>>>>>>>>>> if (sched_idle_cpu(i)) >>>>>>>>>>>>>> return i; >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (available_idle_cpu(i)) { >>>>>>>>>>>>>> - struct rq *rq = cpu_rq(i); >>>>>>>>>>>>>> struct cpuidle_state *idle = >>>>>>>>>>>>>> idle_get_state(rq); >>>>>>>>>>>>>> if (idle && idle->exit_latency < >>>>>>>>>>>>>> min_exit_latency) { >>>>>>>>>>>>>> /* >>>>>>>>>>>>>> @@ -6224,8 +6239,18 @@ static int select_idle_cpu(struct >>>>>>>>>>>>>> task_struct *p, struct sched_domain *sd, int t >>>>>>>>>>>>>> for_each_cpu_wrap(cpu, cpus, target) { >>>>>>>>>>>>>> if (!--nr) >>>>>>>>>>>>>> return -1; >>>>>>>>>>>>>> - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) >>>>>>>>>>>>>> - break; >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) { >>>>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>>>>>> + /* >>>>>>>>>>>>>> + * If Core Scheduling is enabled, select >>>>>>>>>>>>>> this cpu >>>>>>>>>>>>>> + * only if the process cookie matches core >>>>>>>>>>>>>> cookie. >>>>>>>>>>>>>> + */ >>>>>>>>>>>>>> + if (sched_core_enabled(cpu_rq(cpu)) && >>>>>>>>>>>>>> + p->core_cookie == >>>>>>>>>>>>>> cpu_rq(cpu)->core->core_cookie) >>>>>>>>>>>>> Why not also add similar logic in select_idle_smt to reduce >>>>>>>>>>>>> forced-idle? :) >>>>>>>>>>>> We hit select_idle_smt after we scaned the entire LLC domain for >>>>>>>>>>>> idle cores >>>>>>>>>>>> and idle cpus and failed,so IMHO, an idle smt is probably a good >>>>>>>>>>>> choice under >>>>>>>>>>>> this scenario. >>>>>>>>>>> >>>>>>>>>>> AFAIC, selecting idle sibling with unmatched cookie will cause >>>>>>>>>>> unnecessary fored-idle, unfairness and latency, compared to >>>>>>>>>>> choosing *target* cpu. >>>>>>>>>> Choosing target cpu could increase the runnable task number on the >>>>>>>>>> target runqueue, this >>>>>>>>>> could trigger busiest->nr_running > 1 logic and makes the idle >>>>>>>>>> sibling trying to pull but >>>>>>>>>> not success(due to cookie not match). Putting task to the idle >>>>>>>>>> sibling is relatively stable IMHO. >>>>>>>>> >>>>>>>>> I’m afraid that *unsuccessful* pullings between smts would not result >>>>>>>>> in unstableness, because >>>>>>>>> the load-balance always do periodicly , and unsuccess means nothing >>>>>>>>> happen. >>>>>>>> unsuccess pulling means more unnecessary overhead in load balance. >>>>>>>> >>>>>>>>> On the contrary, unmatched sibling tasks running concurrently could >>>>>>>>> bring forced-idle to each other repeatedly, >>>>>>>>> Which is more unstable, and more costly when pick_next_task for all >>>>>>>>> siblings. >>>>>>>> Not worse than two tasks ping-pong on the same target run queue I >>>>>>>> guess, and better if >>>>>>>> - task1(cookie A) is running on the target, and task2(cookie B) in the >>>>>>>> runqueue, >>>>>>>> - task3(cookie B) coming >>>>>>>> >>>>>>>> If task3 chooses target's sibling, it could have a chance to run >>>>>>>> concurrently with task2. >>>>>>>> But if task3 chooses target, it will wait for next pulling luck of >>>>>>>> load balancer >>>>>>> That’s more interesting. :) >>>>>>> Distributing different cookie tasks onto different cpus(or cpusets) >>>>>>> could be the *ideal stable status* we want, as I understood. >>>>>>> Different cookie tasks running on sibling smts could hurt performance, >>>>>>> and that should be avoided with best effort. >>>>>> We already tried to avoid when we scan idle cores and idle cpus in llc >>>>>> domain. >>>>> >>>>> I’m afraid that’s not enough either, :) >>>>> 1. Scanning Idle cpus is not a full scan, there is limit according to >>>>> scan cost. >>>>> 2. That's only trying at the *core/cpu* level, *SMT* level should be >>>>> considered too. >>>>> >>>>>> >>>>>>> For above case, selecting idle sibling cpu can improve the concurrency >>>>>>> indeed, but it decrease the imbalance for load-balancer. >>>>>>> In that case, load-balancer could not notice the imbalance, and would >>>>>>> do nothing to improve the unmatched situation. >>>>>>> On the contrary, choosing the *target* cpu could enhance the imbalance, >>>>>>> and load-balancer could try to pull unmatched task away, >>>>>> Pulling away to where needs another bunch of elaboration. >>>>> >>>>> Still with the SMT2+3tasks case, >>>>> if *idle sibling* chosen, >>>>> Smt1’s load = task1+task2, smt2’s load = task3. Task3 will run >>>>> intermittently because of forced-idle, >>>>> so smt2’s real load could low enough, that it could not be pulled away >>>>> forever. That’s indeed a stable state, >>>>> but with performance at a discount. >>>>> >>>>> If *target sibling* chose, >>>>> Smt1’s load = task1+task2+task3, smt2’s load=0. It’s a obvious imbalance, >>>>> and load-balancer will pick a task to pull, >>>>> 1. If task1(cookie A) picked, that’s done for good. >>>>> 2. If task2(cookie B) or task3(cookie B) picked, that’s ok too, the rest >>>>> task(cookie B) could be pulled away at next balance(maybe need to improve >>>>> the pulling to tend to pull matched task more aggressively). >>>>> And then, we may reach a more stable state *globally* without performance >>>>> discount. >>>> >>>> I'm not sure what you mean pulled away, >>> I mean pulled away by other cpus, may be triggered by idle balance or >>> periodic balance on other cpus. >>> >>>> - if you mean pulled away from this core, cookieA in idle sibling case can >>>> be >>>> pulled away too. >>> Yep, cookieA(task1) in idle sibling case could be pulled away, but >>> cookieB(task3) on the smt2 could never get the chance being pulled >>> away(unless being waken up). >>> If cookieA(task1) failed being pulled(cookieB(task2) on smt1 may be pulled, >>> 50% chance), cookieA(task1) and cookieB(task3) would reach the stable state >>> with performance discount. >>> >> If you meant pulled away from this core, I didn't see how two cases are >> different either. For example, when task2(cookieB) runs on SMT1, task3 >> cookieb can be pulled to SMT2. and when task1(cookieA) switch onto SMT1, >> task2(cookieB) can be pulled away by other cpus, too. > That’s the case only if SMT2’s pulling happens when task2(cookieB) is running > on SMT1, which depends on, > 1. Smt2 not entering tickless or nohz_balancer_kick picks smt2 before other > cpu’s pulling, may be unlikely. :) > 2. Task1(cookieA) is not running on SMT1. > otherwise it would be the case I described above. > > Besides, for other cases, like smt2+2task(CookieA+CookieB), picking *target* > cpu instead of *idle sibling* could be more helpful to reach the global stable > status(distribute different cookies onto different cores). >If the task number of two cookies has a big difference, then distributing different cookies onto different cores leads to a big imbalance, that state may be stable but not an optimal state, I guess that's why core run queue does not refuse different cookies onto its rb tree.
I think I understand your concern but IMHO I'm not convinced adding cookie match in idle SMT selection is a best choice, if you have some performance data of your workload, that will be very helpful to understand the case. If distributing different cookies onto different cores is a hard requirement from your side, you are welcome to submit a patch to see others opinion. Thanks, -Aubrey

