Does the laptop really have 4 cores or is it 2 cores with hyperthreading? My guess is the latter, and that will contribute to the timing you're seeing. Also, other things are going on in the system. Do larger jobs show a better or worse speedup?
On Feb 28, 2013, at 6:15 AM, Joseph Rushton Wakeling <[email protected]> wrote: > Hello all, > > I'm in need of some guidance regarding std.concurrency. Before writing > further, I should add that I'm an almost complete novice where concurrency is > concerned, in general and particularly with D: I've written a few programs > that made use of std.parallelism but that's about it. > > In this case, there's a strong need to use std.concurrency because the > functions that will be run in parallel involve generating substantial > quantities of random numbers. AFAICS std.parallelism just isn't safe for > that, in a statistical sense (no idea how it might slow things down in terms > of shared access to a common rndGen). > > Now, I'm not naive enough to believe that using n threads will simply result > in the program runtime being divided by n. However, the results I'm getting > with some simple test code (attached) are curious and I'd like to understand > better what's going on. > > The program is simple enough: > > foreach(i; iota(n)) > spawn(&randomFunc, m); > > ... where randomFunc is a function that generates and sums m different random > numbers. For speed comparison one can do instead, > > foreach(i; iota(n)) > randomFunc(m); > > With m = 100_000_000 being chosen for my case. > > Setting n = 2 on my 4-core laptop, the sequential case runs in about 4 s; the > concurrent version using spawn() runs in about 2.2 s (the total amount of > "user" time given for the sequential programs is about 4 s and about 4.3 s > respectively). So, roughly half speed, as you might expect. > > Setting n = 3, the sequential case runs in about 6 s (surprise!), the > concurrent version in about 3 (with about 8.1 s of "user" time recorded). In > other words, the program speed is only half that of the sequential version, > even though there's no shared data and the CPU can well accommodate the 3 > threads at full speed. (In fact 270% CPU usage is recorded, but that should > still see a faster program.) > > Setting n = 4, the sequential case runs in 8 s, the concurrent in about 3.8 > (with 14.8 s of "user" time recorded), with 390% CPU usage. > > In other words, it doesn't seem possible to get more than about 2 * speedup > on my system from using concurrency, even though there should not be any data > races or other factors that might explain slower performance. > > I didn't expect speed / n, but I did expect something a little better than > this -- so can anyone suggest what might be going on here? (Unfortunately, I > don't have a system with a greater number of cores on which to test with > greater numbers of threads.) > > The times reported here are for programs compiled with GDC, but using LDC or > DMD produces similar behaviour. > > Can anyone advise? > > Thanks & best wishes, > > -- Joe > <concur.d>
