Re: David Simcha's std.parallelism

dsimcha Sun, 09 Jan 2011 08:55:15 -0800

On 1/1/2011 6:07 PM, Andrei Alexandrescu wrote:

* parallel is templated on range, but not on operation. Does this affect
speed for brief operations (such as the one given in the example,
squares[i] = i * i)? I wonder if using an alias wouldn't be more
appropriate. Some performance numbers would be very useful in any case.

Ok, I did the benchmarks. Since map is templated on the operation, Iused that as a benchmark of the templating on operation scenario.Here's the benchmark:


import std.parallelism, std.stdio, std.datetime, std.range, std.conv,
    std.math;

int fun1(int num) {
    return roundTo!int(sqrt(num));
}

int fun2(int num) {
    return num * num;
}

alias fun2 fun;

void main() {
    auto foo = array(iota(10_000_000));
    auto bar = new int[foo.length];

    enum workUnitSize = 1_000_000;

    auto sw = StopWatch(autoStart);
    foreach(i, elem; parallel(foo, workUnitSize)) {
        bar[i] = fun(elem);
    }
    writeln("Parallel Foreach:  ", sw.peek.milliseconds);

    sw = StopWatch(autoStart);
    bar = taskPool.map!fun(foo, workUnitSize, bar);
    writeln("Map:  ", sw.peek.milliseconds);

    sw = StopWatch(autoStart);
    foreach(i, elem; foo) {
        bar[i] = fun(elem);
    }
    writeln("Serial:  ", sw.peek.milliseconds);
}


Results:

Parallel Foreach:  69.2988
Map:  29.1973
Serial:  40.2884


So obviously there's a huge penalty when the loop body is super cheap.

On the other hand, when I make fun1 the loop body instead (and it'sstill a fairly cheap body), the differences are buried in noise.

Now that I've given my honest report of the facts, though, I'd like tosay that even so, I'm in favor of leaving things as-is, for thefollowing reasons:

1. Super cheap loop bodies are usually not worth parallelizing anyhow.You get nowhere near a linear speedup due to memory bandwidth issues,etc., and if some super cheap loop body is your main bottleneck, it'sprobably being executed in in some outer loop and it may make more senseto parallelize the outer loop. In all my experience withstd.parallelism, I've **never** had the the need/desire to resort toparallelism fine grained enough that the limitations of delegate-basedparallel foreach mattered in practice.

2. If you really want to parallelize super cheap loop bodies, map()isn't going anywhere and that and/or reduce(), which also usestemplates, will usually do what you need. You can even use parallel mapin place by simply passing in the same (writeable) range for both theinput and the buffer.

3. The foreach syntax makes the following very useful things (as in, Iactually use them regularly) possible that wouldn't be possible if weused templates:


foreach(index, elem; parallel(range))
foreach(ref elem; parallel(range))

It also just plain looks nice.

4. A major point of parallel foreach is that variables in the outerscope "just work". When passing blocks of code as aliases instead ofdelegates, this is still very buggy.

5. I'm hoping I can convince Walter to implement an alias-based versionof opApply, which is half-implemented and commented out in the DMDsource code. If this were implemented, I'd change std.parallelism touse it and this whole discussion would be moot.

Re: David Simcha's std.parallelism

Reply via email to