== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> Yes, but why does multiple cores make the problem worse?  If it's the
> lock, then I'd expect just locking in multiple threads without any
> appending does worse on multiple cores than on a single core.

It does.

import std.stdio, std.perf, core.thread;

void main() {
    writeln("Set affinity, then press enter.");
    readln();

    auto pc = new PerformanceCounter;
    pc.start;

    enum nThreads = 4;
    auto threads = new Thread[nThreads];
    foreach(ref thread; threads) {
        thread = new Thread(&doStuff);
        thread.start();
    }

    foreach(thread; threads) {
        thread.join();
    }

    pc.stop;
    writeln(pc.milliseconds);
}

void doStuff() {
    foreach(i; 0..1_000_000) {
       synchronized {}
    }
}

Timing with affinity for all CPUs:  20772 ms.
Timing with affinity for 1 CPU:  156 ms.

Heavy lock contention **kills** multithreaded code b/c not only do you serialize
everything, but the OS has to perform a context switch on every contention.

I posted about a year ago that using spinlocks in the GC massively sped things 
up,
at least in synthetic benchmarks, if you have heavy contention and multiple 
cores.
 See
http://www.digitalmars.com/d/archives/digitalmars/D/More_on_GC_Spinlocks_80485.html
 .
 However, this post largely got ignored.

> If it's the
> lookup, why does it take longer to lookup on multiple cores?

Because appending to multiple arrays simultaneously (whether on single or 
multiple
cores) causes each array's append to evict the other array's append from the GC
block info cache.  If you set the affinity to only 1 CPU, this only happens once
per context switch.

Reply via email to