Does the laptop really have 4 cores or is it 2 cores with hyperthreading?  My 
guess is the latter, and that will contribute to the timing you're seeing. 
Also, other things are going on in the system. Do larger jobs show a better or 
worse speedup?

On Feb 28, 2013, at 6:15 AM, Joseph Rushton Wakeling 
<[email protected]> wrote:

> Hello all,
> 
> I'm in need of some guidance regarding std.concurrency.  Before writing 
> further, I should add that I'm an almost complete novice where concurrency is 
> concerned, in general and particularly with D: I've written a few programs 
> that made use of std.parallelism but that's about it.
> 
> In this case, there's a strong need to use std.concurrency because the 
> functions that will be run in parallel involve generating substantial 
> quantities of random numbers.  AFAICS std.parallelism just isn't safe for 
> that, in a statistical sense (no idea how it might slow things down in terms 
> of shared access to a common rndGen).
> 
> Now, I'm not naive enough to believe that using n threads will simply result 
> in the program runtime being divided by n.  However, the results I'm getting 
> with some simple test code (attached) are curious and I'd like to understand 
> better what's going on.
> 
> The program is simple enough:
> 
>      foreach(i; iota(n))
>            spawn(&randomFunc, m);
> 
> ... where randomFunc is a function that generates and sums m different random 
> numbers.  For speed comparison one can do instead,
> 
>      foreach(i; iota(n))
>            randomFunc(m);
> 
> With m = 100_000_000 being chosen for my case.
> 
> Setting n = 2 on my 4-core laptop, the sequential case runs in about 4 s; the 
> concurrent version using spawn() runs in about 2.2 s (the total amount of 
> "user" time given for the sequential programs is about 4 s and about 4.3 s 
> respectively).  So, roughly half speed, as you might expect.
> 
> Setting n = 3, the sequential case runs in about 6 s (surprise!), the 
> concurrent version in about 3 (with about 8.1 s of "user" time recorded).  In 
> other words, the program speed is only half that of the sequential version, 
> even though there's no shared data and the CPU can well accommodate the 3 
> threads at full speed.  (In fact 270% CPU usage is recorded, but that should 
> still see a faster program.)
> 
> Setting n = 4, the sequential case runs in 8 s, the concurrent in about 3.8 
> (with 14.8 s of "user" time recorded), with 390% CPU usage.
> 
> In other words, it doesn't seem possible to get more than about 2 * speedup 
> on my system from using concurrency, even though there should not be any data 
> races or other factors that might explain slower performance.
> 
> I didn't expect speed / n, but I did expect something a little better than 
> this -- so can anyone suggest what might be going on here?  (Unfortunately, I 
> don't have a system with a greater number of cores on which to test with 
> greater numbers of threads.)
> 
> The times reported here are for programs compiled with GDC, but using LDC or 
> DMD produces similar behaviour.
> 
> Can anyone advise?
> 
> Thanks & best wishes,
> 
>    -- Joe
> <concur.d>

Reply via email to