-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/10/14 15:27, flamencofantasy via Digitalmars-d-learn wrote: > Hello, > > I am summing up the first 1 billion integers in parallel and in a > single thread and I'm observing some curious results;
I am fairly certain that your use of "parallel for" introduces quite a lot of threads other than you "master" one. > parallel sum : 499999999500000000, elapsed 102833 ms single thread > sum : 499999999500000000, elapsed 1667 ms > > The parallel version is 60+ times slower on my i7-3770K CPU. I > think that maybe due to the CPU constantly flushing and reloading > the caches in the parallel version but I don't know for sure. I would bet there are cache problems, but far more likely that the core problem is all the thread activity and in particular all the synchronization. > Here is the D code; > > shared ulong sum = 0; ulong iter = 1_000_000_000UL; > > StopWatch sw; > > sw.start(); > > foreach(i; parallel(iota(0, iter))) { atomicOp!"+="(sum, i); } Well that will be the problem then, lots and lots of synchronization with the billion tasks you have set up. I am highly surprised this is only 60 times slower than sequential! > sw.stop(); > > writefln("parallel sum : %s, elapsed %s ms", sum, > sw.peek().msecs); > > sum = 0; > > sw.reset(); > > sw.start(); > > for (ulong i = 0; i < iter; ++i) { sum += i; } > > sw.stop(); > > writefln("single thread sum : %s, elapsed %s ms", sum, > sw.peek().msecs); > > Out of curiosity I tried the equivalent code in C# and I got this; > > parallel sum : 499999999500000000, elapsed 20320 ms single thread > sum : 499999999500000000, elapsed 1901 ms > > The C# parallel is about 3 times faster than the D parallel which > is strange on the exact same CPU. > > And here is the C# code; > > long sum = 0; long iter = 1000000000L; > > var sw = Stopwatch.StartNew(); > > Parallel.For(0, iter, i => { Interlocked.Add(ref sum, i); }); Useful moral of this story is that C# synchronization in this (somewhat perverse) context is relatively much more efficient than that of D. There is almost certainly a useful benchmark test that can come of this for the std.parallelism implementation (if only I had a few cycles to get really stuck in to a review and analysis of the module :-( > Console.WriteLine("parallel sum : {0}, elapsed {1} ms", sum, > sw.ElapsedMilliseconds); > > sum = 0; > > sw = Stopwatch.StartNew(); > > for (long i = 0; i < iter; ++i) { sum += i; } > > Console.WriteLine("single thread sum : {0}, elapsed {1} ms", sum, > sw.ElapsedMilliseconds); > > Thoughts? - -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlQxeZ0ACgkQ+ooS3F10Be+DKQCgu2Ro+2bVmEua3oPHZ6kAqUVv cg8AoLpN3BRvLBQLT8qDaiP0wVMS5dQZ =w4Gx -----END PGP SIGNATURE-----