I have no idea why the euclidean benchmark shows a superlinear speedup
without -release, though I'm able to reproduce this on my box. Must
have something to do with std.algorithm's use of asserts or something.
As far as operating systems, I'm glad you tested on XP32. One thing
that can make a **huge** difference is that, on XP, synchronized blocks
immediately hit kernel calls and context switches unless you use the
Windows API directly to explicitly override this behavior. On Vista and
7, the default behavior (which D uses) is to spin for a short period of
time before context switching when waiting on a lock. This is usually
vastly more efficient in the case of heavily contested, fine grained
locking. I tested on Windows 7 and I'm very happy that none of the
numbers completely blew up on XP because of this issue.
On 2/26/2011 5:30 PM, Andrej Mitrovic wrote:
Without release, only the euclidean benchmark shows a more dramatic
speed difference:
Serial reduce: 6298 milliseconds.
Parallel reduce with 4 cores: 567 milliseconds.
I forgot to mention I'm on XP32. I could test these on a virtualized
Linux, if that's worth testing.