* Paul Jackson <[EMAIL PROTECTED]> wrote:

> Three more observations.
> 
>  1) The slowest measure_one() calls are, not surprisingly, those for the
>     largest sizes.  At least on my test system of the moment, the plot
>     of cost versus size has one major maximum (a one hump camel, not two).
>     
>     Seems like if we computed from smallest size upward, instead of largest
>     downward, and stopped whenever two consecutive measurements were less
>     than say 70% of the max seen so far, then we could save a nice chunk
>     of the time.
> 
>     Of course, if two hump systems exist, this is not reliable on them.

yes, this is the approach i'm currently working on, but it's not 
reliable yet. (one of the systems i have drifts its cost into infinity 
after the hump, which shouldnt happen)

>  2) Trivial warning fix for printf format mismatch:

thx.

>  3) I was noticing that my test system was only showing a couple of 
>     distinct values for cpu_distance, even though it has 4 distinct 
>     distances for values of node_distance.  So I coded up a variant of 
>     cpu_distance that converts the problem to a node_distance problem, 
>     and got the following cost matrix:
> 
> =================================== begin ===================================
> Total of 8 processors activated (15515.64 BogoMIPS).
> ---------------------
> migration cost matrix (max_cache_size: 0, cpu: -1 MHz):
> ---------------------
>           [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]
> [00]:     -     4.0(0) 21.7(1) 21.7(1) 25.2(2) 25.2(2) 25.3(3) 25.3(3)
> [01]:   4.0(0)    -    21.7(1) 21.7(1) 25.2(2) 25.2(2) 25.3(3) 25.3(3)
> [02]:  21.7(1) 21.7(1)    -     4.0(0) 25.3(3) 25.3(3) 25.2(2) 25.2(2)
> [03]:  21.7(1) 21.7(1)  4.0(0)    -    25.3(3) 25.3(3) 25.2(2) 25.2(2)
> [04]:  25.2(2) 25.2(2) 25.3(3) 25.3(3)    -     4.0(0) 21.7(1) 21.7(1)
> [05]:  25.2(2) 25.2(2) 25.3(3) 25.3(3)  4.0(0)    -    21.7(1) 21.7(1)
> [06]:  25.3(3) 25.3(3) 25.2(2) 25.2(2) 21.7(1) 21.7(1)    -     4.0(0)
> [07]:  25.3(3) 25.3(3) 25.2(2) 25.2(2) 21.7(1) 21.7(1)  4.0(0)    -
> ---------------------
> cacheflush times [4]: 4.0 (4080540) 21.7 (21781380) 25.2 (25259428) 25.3 
> (25372682)

i'll first try the bottom-up approach to speed up detection (getting to
the hump is very fast most of the time). The hard part was to create a
workload that generates the hump reliably on a number of boxes - i'm
happy it works on ia64 too.

then we can let the arch override the cpu_distance() method, although i
do think that _if_ there is a significant hierarchy between CPUs it
should be represented via a matching sched-domains hierarchy, and the
full hierarchy should be tuned accordingly.

btw., the migration cost matrix we can later use to tune all the other 
sched-domains balancing related tunables as well - cache_hot_time is 
just the first obvious step. (which also happens to make the most 
difference.)

        Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to