Hello, On Sun, Jul 8, 2012 at 12:29 PM, Mihai Carabas <mihai.cara...@gmail.com>wrote:
> Hello, > > This week I worked on an heuristic that sounds like this: always schedule > a process to the closest cpu relatively to the cpu it had run before.(e.g.: > try to run on the cpu that had run before, if it isn't free, try on it's > sibling on the current level (let's say thread level), if no sibling was > found, go to the next level (which is the core level), and so on). So, in > other words, the process would be scheduled to the closest cache it can (it > couldn't be scheduled on the same core, will loose the level1 cache > hotness, but it can use the level3 cache hotness). Unfortunately, I > couldn't find a case to make a difference on my corei3. The reason would be > that the time quanta of a process is big enough in dragonfly that a process > that is scheduled is able to use a great part of the L1/L2 cache and on a > context switch the cache will be invalidated. With the L3 cache shared > among all cores, there will be no gain. I am waiting for access on a > multi-socket machine, to test it there. > This week I haven't done much coding. I tested the heuristic above on monster (thanks dillon) and on a dual-socket (thanks ftigeot) amd but the results weren't meaningful. I start adding debug info to see what happens. It seems that the processes bumps from one CPU to another when there are more processes than cpus (the setrunqueue bits is the active mechanism for scheduling, but there is another one - when a process finishes or blocks, that CPU tries and pull a process from the bsd4 runqueue, but that process didn't run before on this CPU). I have discussed the issue with Alex and Matthew and I am currently working out on a solution.