I have no idea why the euclidean benchmark shows a superlinear speedup without -release, though I'm able to reproduce this on my box. Must have something to do with std.algorithm's use of asserts or something.

As far as operating systems, I'm glad you tested on XP32. One thing that can make a **huge** difference is that, on XP, synchronized blocks immediately hit kernel calls and context switches unless you use the Windows API directly to explicitly override this behavior. On Vista and 7, the default behavior (which D uses) is to spin for a short period of time before context switching when waiting on a lock. This is usually vastly more efficient in the case of heavily contested, fine grained locking. I tested on Windows 7 and I'm very happy that none of the numbers completely blew up on XP because of this issue.

On 2/26/2011 5:30 PM, Andrej Mitrovic wrote:
Without release, only the euclidean benchmark shows a more dramatic
speed difference:
Serial reduce:  6298 milliseconds.
Parallel reduce with 4 cores:  567 milliseconds.

I forgot to mention I'm on XP32. I could test these on a virtualized
Linux, if that's worth testing.

Reply via email to