* Srikar Dronamraju <[email protected]> wrote:

> Currently task scan rate is reset when numa balancer migrates the task
> to a different node. If numa balancer initiates a swap, reset is only
> applicable to the task that initiates the swap. Similarly no scan rate
> reset is done if the task is migrated across nodes by traditional load
> balancer.
> 
> Instead move the scan reset to the migrate_task_rq. This ensures the
> task moved out of its preferred node, either gets back to its preferred
> node quickly or finds a new preferred node. Doing so, would be fair to
> all tasks migrating across nodes.
> 
> specjbb2005 / bops/JVM / higher bops are better
> on 2 Socket/2 Node Intel
> JVMS  Prev    Current  %Change
> 4     210118  208862   -0.597759
> 1     313171  307007   -1.96825
> 
> 
> on 2 Socket/4 Node Power8 (PowerNV)
> JVMS  Prev     Current  %Change
> 8     91027.5  89911.4  -1.22611
> 1     216460   216176   -0.131202
> 
> 
> on 2 Socket/2 Node Power9 (PowerNV)
> JVMS  Prev    Current  %Change
> 4     191918  196078   2.16759
> 1     207043  214664   3.68088
> 
> 
> on 4 Socket/4 Node Power7
> JVMS  Prev     Current  %Change
> 8     58462.1  60719.2  3.86079
> 1     108334   112615   3.95167
> 
> 
> dbench / transactions / higher numbers are better
> on 2 Socket/2 Node Intel
> count  Min      Max      Avg      Variance  %Change
> 5      11851.8  11937.3  11890.9  33.5169
> 5      12511.7  12559.4  12539.5  15.5883   5.45459
> 
> 
> on 2 Socket/4 Node Power8 (PowerNV)
> count  Min      Max      Avg      Variance  %Change
> 5      4791     5016.08  4962.55  85.9625
> 5      4709.28  4979.28  4919.32  105.126   -0.871125
> 
> 
> on 2 Socket/2 Node Power9 (PowerNV)
> count  Min      Max      Avg     Variance  %Change
> 5      9353.43  9380.49  9369.6  9.04361
> 5      9388.38  9406.29  9395.1  5.98959   0.272157
> 
> 
> on 4 Socket/4 Node Power7
> count  Min      Max      Avg      Variance  %Change
> 5      149.518  215.412  179.083  21.5903
> 5      157.71   184.929  174.754  10.7275   -2.41731
> 
> Signed-off-by: Srikar Dronamraju <[email protected]>
> ---
>  kernel/sched/fair.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a5936ed..4ea0eff 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1837,12 +1837,6 @@ static int task_numa_migrate(struct task_struct *p)
>       if (env.best_cpu == -1)
>               return -EAGAIN;
>  
> -     /*
> -      * Reset the scan period if the task is being rescheduled on an
> -      * alternative node to recheck if the tasks is now properly placed.
> -      */
> -     p->numa_scan_period = task_scan_start(p);
> -
>       best_rq = cpu_rq(env.best_cpu);
>       if (env.best_task == NULL) {
>               ret = migrate_task_to(p, env.best_cpu);
> @@ -6361,6 +6355,19 @@ static void migrate_task_rq_fair(struct task_struct 
> *p, int new_cpu __maybe_unus
>  
>       /* We have migrated, no longer consider this task hot */
>       p->se.exec_start = 0;
> +
> +#ifdef CONFIG_NUMA_BALANCING
> +     if (!p->mm || (p->flags & PF_EXITING))
> +             return;
> +
> +     if (p->numa_faults) {
> +             int src_nid = cpu_to_node(task_cpu(p));
> +             int dst_nid = cpu_to_node(new_cpu);
> +
> +             if (src_nid != dst_nid)
> +                     p->numa_scan_period = task_scan_start(p);
> +     }
> +#endif

Please don't add #ifdeffery inside functions, especially not if they do weird 
flow control like 
a 'return' from the middle of a block.

A properly named inline helper would work I suppose.

Thanks,

        Ingo

Reply via email to