== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article > Yes, but why does multiple cores make the problem worse? If it's the > lock, then I'd expect just locking in multiple threads without any > appending does worse on multiple cores than on a single core.
It does. import std.stdio, std.perf, core.thread; void main() { writeln("Set affinity, then press enter."); readln(); auto pc = new PerformanceCounter; pc.start; enum nThreads = 4; auto threads = new Thread[nThreads]; foreach(ref thread; threads) { thread = new Thread(&doStuff); thread.start(); } foreach(thread; threads) { thread.join(); } pc.stop; writeln(pc.milliseconds); } void doStuff() { foreach(i; 0..1_000_000) { synchronized {} } } Timing with affinity for all CPUs: 20772 ms. Timing with affinity for 1 CPU: 156 ms. Heavy lock contention **kills** multithreaded code b/c not only do you serialize everything, but the OS has to perform a context switch on every contention. I posted about a year ago that using spinlocks in the GC massively sped things up, at least in synthetic benchmarks, if you have heavy contention and multiple cores. See http://www.digitalmars.com/d/archives/digitalmars/D/More_on_GC_Spinlocks_80485.html . However, this post largely got ignored. > If it's the > lookup, why does it take longer to lookup on multiple cores? Because appending to multiple arrays simultaneously (whether on single or multiple cores) causes each array's append to evict the other array's append from the GC block info cache. If you set the affinity to only 1 CPU, this only happens once per context switch.