:I modified my cache-coherent heuristic, for the monster, so that it tries :to stick a process to a socket, no matter what core. The L3 cache comes to :work, at that's why we have these results. :... : :[1] http://leaf.dragonflybsd.org/~mihaic/pgbench_monster.pdf : :Mihai
That's a lovely chart. There is a very clear distinction with your scheduler work on the tail end of the curve past 55. Big improvements where the caches are clearly being stressed. I also see a tiny improvement on the front-end of the graph too where the algorithm is reducing unnecessary thread switches between cores. That droop mid-section from 45-55 or so isn't that big a deal. Edge conditions as it transitions into batch operation (and figuring out precisely what the cause is might be difficult with so many moving parts). I'm not at all surprised to see some volatility. Also, at that point, lock contention within the database asserts itself which is probably a major component of the fall-off that both schedulers have past 45. What's important is the improvement in the cache-stressed overloaded case from 55 onward. -- Last year we did a lot of work on concurrency within the kernel related to the first half of the graph. Just getting the curve to ramp up to ~48 cores (25 on the graph) and then flatten out (through ~45 on the graph) was a major accomplishment. -Matt Matthew Dillon <dil...@backplane.com>