Hello,

At [1] you have a chart with a pgbench running on monster for a range of 5
to100 threads/clients (for 5, there are 5 threads and 5 clients, so a total
of 10 new processes). As you can see, when the number of running processes
is greater than the number of cores, the cache-coherent heuristic has much
better results (in some cases more than ~15%).
I modified my cache-coherent heuristic, for the monster, so that it tries
to stick a process to a socket, no matter what core. The L3 cache comes to
work, at that's why we have these results.
Now I'm working to make a tunable option for selecting the level you want
to stick a process (thread level, core level, socket/package level). By
tomorrow, I will commit it.
Also, this week I cleaned up the code and introduce KTR debug options
instead of my old kprintfs. The next week, I will come with this kind of
charts for core i3, core i7 (alexh) and dual-xeon (ftigeot).

[1] http://leaf.dragonflybsd.org/~mihaic/pgbench_monster.pdf

Mihai

Reply via email to