Hello, This week I tried to implement a new heuristic that implies hot caches: schedule the threads belonging to a process on the same core to make use of the L1 cache they share. For this I made a test scenario: one process that run two pthreads. Each thread is doing computations on the same region of memory (one thread on the odd indexes and another on the even ones of a matrix).
The hw was a Core i3 (dual-core with HT): when running the two threads on different cores, the time for the computations is 9sec.When running on the same core, the time is almost double (17sec). Another test case would be to ocupy all 4 logical cpus. For this I ran out two processes, each with two threads. The time was ~17sec, no matter that the two threads were running on the same core or not. So there is no benefict for this heuristic apparently. In the linux smt paper, they didn't talk about this case (where to schedule the threads belonging to the same process)....they probably knew the results. As these being said, I will continue with scheduling heuristics regarding the cache. The next one I will try: schedule the process to the closest cache that had run before (try to schedule on the same core, no matter what thread - L1 cache; then try to schedule within the same chip, no matter what core - L2-3 cache, etc). Also, the heuristic that had no results on HT, would be good for applying to packages/cores (always schedule on the same package, no matter what core to use the hot caches). Another work to be done is the bug fixing regarding the CPU topology on the monster. It seems it's a problem there. Mihai