I tried a shared memory parallel increment. Yes, it's basically
a cache line thrasher, but I wanted to see what's involved in
shared memory programming. Even though I tried to follow all the
rules to make true shared memory (not thread local) it appears I
failed, as the wait loop at the end only sees its own local 250
million increments?
import core.atomic : atomicFetchAdd;
import std.stdio : writeln;
import std.concurrency : spawn;
import core.time : msecs;
import core.thread : Thread;
const uint NSWEPT = 1_000_000_000;
const uint NCPU = 4;
void
doadd(ref shared(uint) val)
{
for (uint count = 0; count < NSWEPT/NCPU; ++count) {
atomicFetchAdd(val, 1);
}
}
void
main()
{
shared(uint) val = 0;
for (int x = 0; x < NCPU-1; ++x) {
spawn(&doadd, val);
}
doadd(val);
while (val != NSWEPT) {
Thread.sleep(1.msecs);
}
}