On Monday, 28 September 2015 at 12:18:28 UTC, Russel Winder wrote:
As a single data point:
====================== anonymous_fix.d ========== 500000500000
real 0m0.168s
user 0m0.200s
sys 0m0.380s
====================== colvin_fix.d ==========
500000500000
real 0m0.036s
user 0m0.124s
sys 0m0.000s
====================== norwood_reduce.d ==========
500000500000
real 0m0.009s
user 0m0.020s
sys 0m0.000s
====================== original.d ==========
218329750363
real 0m0.024s
user 0m0.076s
sys 0m0.000s
Original is the original, not entirely slow, but broken :-).
anonymous is the anonymous' synchronized keyword version, slow.
colvin_fix is John Colvin's use of atomicOp, correct but only
ok-ish on speed. Jay Norword first proposed the reduce answer
on the list, I amended it a tiddly bit, but clearly it is a
resounding speed winner.
Pretty much as expected. Locks are slow, shared accumulators
suck, much better to write to thread local and then merge.