Hi Hillf, Mel and all,

thanks for the patch! It has produced really GOOD results.

1) It has fixed performance problems with 5.7 vanilla kernel for
single-tenant workload and low system load scenarios, without
performance degradation for the multi-tenant tasks. It's producing the
same results as the previous proof-of-concept patch where
adjust_numa_imbalance function was modified to be a no-op (returning
the same value of imbalance as it gets on the input).

2) We have also added Mel's netperf-cstate-small-cross-socket test to
our test battery:
https://github.com/gormanm/mmtests/blob/master/configs/config-network-netperf-cstate-small-cross-socket

Mel told me that he had seen significant performance improvements with
5.7 over 5.6 for the netperf-cstate-small-cross-socket scenario.

Out of 6 different patches we have tested, your patch has performed
the best for this scenario. Compared to vanilla, we see minimal
performance degradation of 2.5% for the udp stream throughput and 0.6%
for the tcp throughput. The testing was done on a dual-socket system
with Gold 6132 CPU.

@Mel - could you please test Hillf's patch with your full testing
suite? So far, it looks very promising, but I would like to check the
patch thoroughly to make sure it does not hurt performance in other
areas.

Thanks a lot!
Jirka












On Tue, May 19, 2020 at 6:32 AM Hillf Danton <hdan...@sina.com> wrote:
>
>
> Hi Jirka
>
> On Mon, 18 May 2020 16:52:52 +0200 Jirka Hladky wrote:
> >
> > We have compared it against kernel with adjust_numa_imbalance disabled
> > [1], and both kernels perform at the same level for the single-tenant
> > jobs, but the proposed patch is bad for the multitenancy mode. The
> > kernel with adjust_numa_imbalance disabled is a clear winner here.
>
> Double thanks to you for the tests!
>
> > We would be very interested in what others think about disabling
> > adjust_numa_imbalance function. The patch is bellow. It would be great
>
> A minute...
>
> > to collect performance results for different scenarios to make sure
> > the results are objective.
>
> I don't have another test case but a diff trying to confine the tool
> in question back to the hard-coded 2's field.
>
> It's used in the first hunk below to detect imbalance before migrating
> a task, and a small churn of code is added at another call site when
> balancing idle CPUs.
>
> Thanks
> Hillf
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1916,20 +1916,26 @@ static void task_numa_find_cpu(struct ta
>          * imbalance that would be overruled by the load balancer.
>          */
>         if (env->dst_stats.node_type == node_has_spare) {
> -               unsigned int imbalance;
> -               int src_running, dst_running;
> +               unsigned int imbalance = 2;
>
> -               /*
> -                * Would movement cause an imbalance? Note that if src has
> -                * more running tasks that the imbalance is ignored as the
> -                * move improves the imbalance from the perspective of the
> -                * CPU load balancer.
> -                * */
> -               src_running = env->src_stats.nr_running - 1;
> -               dst_running = env->dst_stats.nr_running + 1;
> -               imbalance = max(0, dst_running - src_running);
> -               imbalance = adjust_numa_imbalance(imbalance, src_running);
> +               //No imbalance computed without spare capacity
> +               if (env->dst_stats.node_type != env->src_stats.node_type)
> +                       goto check_imb;
> +
> +               imbalance = adjust_numa_imbalance(imbalance,
> +                                               env->src_stats.nr_running);
> +
> +               //Do nothing without imbalance
> +               if (!imbalance) {
> +                       imbalance = 2;
> +                       goto check_imb;
> +               }
> +
> +               //Migrate task if it's likely to grow balance
> +               if (env->dst_stats.nr_running + 1 < env->src_stats.nr_running)
> +                       imbalance = 0;
>
> +check_imb:
>                 /* Use idle CPU if there is no imbalance */
>                 if (!imbalance) {
>                         maymove = true;
> @@ -9011,12 +9017,13 @@ static inline void calculate_imbalance(s
>                         env->migration_type = migrate_task;
>                         env->imbalance = max_t(long, 0, (local->idle_cpus -
>                                                  busiest->idle_cpus) >> 1);
> -               }
>
> -               /* Consider allowing a small imbalance between NUMA groups */
> -               if (env->sd->flags & SD_NUMA)
> -                       env->imbalance = adjust_numa_imbalance(env->imbalance,
> -                                               busiest->sum_nr_running);
> +                       /* Consider allowing a small imbalance between NUMA 
> groups */
> +                       if (env->sd->flags & SD_NUMA &&
> +                           local->group_type == busiest->group_type)
> +                               env->imbalance = 
> adjust_numa_imbalance(env->imbalance,
> +                                                       
> busiest->sum_nr_running);
> +               }
>
>                 return;
>         }
> --
>


-- 
-Jirka

Reply via email to