* Ingo Molnar <[email protected]> wrote:

> So my thinking here is: if the NUMA balancing code (which is node granular at 
> the moment and uses node masks, etc.) is extended to be CPU granular (which 
> is a 
> big task in itself), then the two problems can be 'unified':
> 
>   - the NUMA balancing code inputs arbitrarly CPU (node) affinity masks from 
> the
>     MM code into the scheduler.
> 
>   - the scheduler syscall ABI (and other configuration sources) inputs 
> arbitrary 
>     CPU affinity masks into the scheduler.
> 
> it's a similar problem, with two (minor looking) complication:

btw., this highlights how hard the optimization problem is: the NUMA balancing 
code is (at least ...) O(nr_nodes^2) complex - but we had O(nr_nodes^3) passes 
too 
in some of the NUMA balancing submissions...

We'd upgrade that to O(nr_cpus^2), which is totally unrealistic with 16,000 
CPUs 
even in a slowpath - but it would probably cause problems even with 120 CPUs. 
It 
will get quadratically worse as the number of CPUs in a system increases on its 
current exponential trajectory ...

So the safest bet would be to restrict any 'perfect' balancing attempts to node 
boundaries. Which won't solve the problem you outlined to begin with.

Thanks,

        Ingo

Reply via email to