Hi, > On Jul 23, 2020, at 4:06 PM, Li, Aubrey <[email protected]> wrote: > > On 2020/7/23 15:47, benbjiang(蒋彪) wrote: >> Hi, >> >>> On Jul 23, 2020, at 1:39 PM, Li, Aubrey <[email protected]> wrote: >>> >>> On 2020/7/23 12:23, benbjiang(蒋彪) wrote: >>>> Hi, >>>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey <[email protected]> >>>>> wrote: >>>>> >>>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote: >>>>>> Hi, >>>>>> >>>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote: >>>>>>>>>> Hi, Aubrey, >>>>>>>>>> >>>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> From: Aubrey Li <[email protected]> >>>>>>>>>>> >>>>>>>>>>> - Don't migrate if there is a cookie mismatch >>>>>>>>>>> Load balance tries to move task from busiest CPU to the >>>>>>>>>>> destination CPU. When core scheduling is enabled, if the >>>>>>>>>>> task's cookie does not match with the destination CPU's >>>>>>>>>>> core cookie, this task will be skipped by this CPU. This >>>>>>>>>>> mitigates the forced idle time on the destination CPU. >>>>>>>>>>> >>>>>>>>>>> - Select cookie matched idle CPU >>>>>>>>>>> In the fast path of task wakeup, select the first cookie matched >>>>>>>>>>> idle CPU instead of the first idle CPU. >>>>>>>>>>> >>>>>>>>>>> - Find cookie matched idlest CPU >>>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core >>>>>>>>>>> cookie matches with task's cookie >>>>>>>>>>> >>>>>>>>>>> - Don't migrate task if cookie not match >>>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose >>>>>>>>>>> core cookie does not match with task's cookie >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Aubrey Li <[email protected]> >>>>>>>>>>> Signed-off-by: Tim Chen <[email protected]> >>>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai <[email protected]> >>>>>>>>>>> --- >>>>>>>>>>> kernel/sched/fair.c | 64 >>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++---- >>>>>>>>>>> kernel/sched/sched.h | 29 ++++++++++++++++++++ >>>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>>>>>>>>> index d16939766361..33dc4bf01817 100644 >>>>>>>>>>> --- a/kernel/sched/fair.c >>>>>>>>>>> +++ b/kernel/sched/fair.c >>>>>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct >>>>>>>>>>> task_numa_env *env, >>>>>>>>>>> if (!cpumask_test_cpu(cpu, env->p->cpus_ptr)) >>>>>>>>>>> continue; >>>>>>>>>>> >>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>>> + /* >>>>>>>>>>> + * Skip this cpu if source task's cookie does not match >>>>>>>>>>> + * with CPU's core cookie. >>>>>>>>>>> + */ >>>>>>>>>>> + if (!sched_core_cookie_match(cpu_rq(cpu), env->p)) >>>>>>>>>>> + continue; >>>>>>>>>>> +#endif >>>>>>>>>>> + >>>>>>>>>>> env->dst_cpu = cpu; >>>>>>>>>>> if (task_numa_compare(env, taskimp, groupimp, maymove)) >>>>>>>>>>> break; >>>>>>>>>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group >>>>>>>>>>> *group, struct task_struct *p, int this >>>>>>>>>>> >>>>>>>>>>> /* Traverse only the allowed CPUs */ >>>>>>>>>>> for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) { >>>>>>>>>>> + struct rq *rq = cpu_rq(i); >>>>>>>>>>> + >>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>>> + if (!sched_core_cookie_match(rq, p)) >>>>>>>>>>> + continue; >>>>>>>>>>> +#endif >>>>>>>>>>> + >>>>>>>>>>> if (sched_idle_cpu(i)) >>>>>>>>>>> return i; >>>>>>>>>>> >>>>>>>>>>> if (available_idle_cpu(i)) { >>>>>>>>>>> - struct rq *rq = cpu_rq(i); >>>>>>>>>>> struct cpuidle_state *idle = idle_get_state(rq); >>>>>>>>>>> if (idle && idle->exit_latency < >>>>>>>>>>> min_exit_latency) { >>>>>>>>>>> /* >>>>>>>>>>> @@ -6224,8 +6239,18 @@ static int select_idle_cpu(struct >>>>>>>>>>> task_struct *p, struct sched_domain *sd, int t >>>>>>>>>>> for_each_cpu_wrap(cpu, cpus, target) { >>>>>>>>>>> if (!--nr) >>>>>>>>>>> return -1; >>>>>>>>>>> - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) >>>>>>>>>>> - break; >>>>>>>>>>> + >>>>>>>>>>> + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) { >>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>>> + /* >>>>>>>>>>> + * If Core Scheduling is enabled, select this >>>>>>>>>>> cpu >>>>>>>>>>> + * only if the process cookie matches core >>>>>>>>>>> cookie. >>>>>>>>>>> + */ >>>>>>>>>>> + if (sched_core_enabled(cpu_rq(cpu)) && >>>>>>>>>>> + p->core_cookie == >>>>>>>>>>> cpu_rq(cpu)->core->core_cookie) >>>>>>>>>> Why not also add similar logic in select_idle_smt to reduce >>>>>>>>>> forced-idle? :) >>>>>>>>> We hit select_idle_smt after we scaned the entire LLC domain for idle >>>>>>>>> cores >>>>>>>>> and idle cpus and failed,so IMHO, an idle smt is probably a good >>>>>>>>> choice under >>>>>>>>> this scenario. >>>>>>>> >>>>>>>> AFAIC, selecting idle sibling with unmatched cookie will cause >>>>>>>> unnecessary fored-idle, unfairness and latency, compared to choosing >>>>>>>> *target* cpu. >>>>>>> Choosing target cpu could increase the runnable task number on the >>>>>>> target runqueue, this >>>>>>> could trigger busiest->nr_running > 1 logic and makes the idle sibling >>>>>>> trying to pull but >>>>>>> not success(due to cookie not match). Putting task to the idle sibling >>>>>>> is relatively stable IMHO. >>>>>> >>>>>> I’m afraid that *unsuccessful* pullings between smts would not result in >>>>>> unstableness, because >>>>>> the load-balance always do periodicly , and unsuccess means nothing >>>>>> happen. >>>>> unsuccess pulling means more unnecessary overhead in load balance. >>>>> >>>>>> On the contrary, unmatched sibling tasks running concurrently could >>>>>> bring forced-idle to each other repeatedly, >>>>>> Which is more unstable, and more costly when pick_next_task for all >>>>>> siblings. >>>>> Not worse than two tasks ping-pong on the same target run queue I guess, >>>>> and better if >>>>> - task1(cookie A) is running on the target, and task2(cookie B) in the >>>>> runqueue, >>>>> - task3(cookie B) coming >>>>> >>>>> If task3 chooses target's sibling, it could have a chance to run >>>>> concurrently with task2. >>>>> But if task3 chooses target, it will wait for next pulling luck of load >>>>> balancer >>>> That’s more interesting. :) >>>> Distributing different cookie tasks onto different cpus(or cpusets) could >>>> be the *ideal stable status* we want, as I understood. >>>> Different cookie tasks running on sibling smts could hurt performance, and >>>> that should be avoided with best effort. >>> We already tried to avoid when we scan idle cores and idle cpus in llc >>> domain. >> >> I’m afraid that’s not enough either, :) >> 1. Scanning Idle cpus is not a full scan, there is limit according to scan >> cost. >> 2. That's only trying at the *core/cpu* level, *SMT* level should be >> considered too. >> >>> >>>> For above case, selecting idle sibling cpu can improve the concurrency >>>> indeed, but it decrease the imbalance for load-balancer. >>>> In that case, load-balancer could not notice the imbalance, and would do >>>> nothing to improve the unmatched situation. >>>> On the contrary, choosing the *target* cpu could enhance the imbalance, >>>> and load-balancer could try to pull unmatched task away, >>> Pulling away to where needs another bunch of elaboration. >> >> Still with the SMT2+3tasks case, >> if *idle sibling* chosen, >> Smt1’s load = task1+task2, smt2’s load = task3. Task3 will run >> intermittently because of forced-idle, >> so smt2’s real load could low enough, that it could not be pulled away >> forever. That’s indeed a stable state, >> but with performance at a discount. >> >> If *target sibling* chose, >> Smt1’s load = task1+task2+task3, smt2’s load=0. It’s a obvious imbalance, >> and load-balancer will pick a task to pull, >> 1. If task1(cookie A) picked, that’s done for good. >> 2. If task2(cookie B) or task3(cookie B) picked, that’s ok too, the rest >> task(cookie B) could be pulled away at next balance(maybe need to improve >> the pulling to tend to pull matched task more aggressively). >> And then, we may reach a more stable state *globally* without performance >> discount. > > I'm not sure what you mean pulled away, I mean pulled away by other cpus, may be triggered by idle balance or periodic balance on other cpus.
> - if you mean pulled away from this core, cookieA in idle sibling case can be > pulled away too. Yep, cookieA(task1) in idle sibling case could be pulled away, but cookieB(task3) on the smt2 could never get the chance being pulled away(unless being waken up). If cookieA(task1) failed being pulled(cookieB(task2) on smt1 may be pulled, 50% chance), cookieA(task1) and cookieB(task3) would reach the stable state with performance discount. Thx. Regards, Jiang > - and if you mean pulled away but within this core, I guess cookieB in target > sibling case can't be pulled away either, as nr_running difference = 1 > > Thanks, > -Aubrey

