Re: std.parallelism curious results

Ali Çehreli via Digitalmars-d-learn Sun, 05 Oct 2014 14:30:41 -0700

On 10/05/2014 07:27 AM, flamencofantasy wrote:

> I am summing up the first 1 billion integers in parallel and in a single
> thread and I'm observing some curious results;
>
> parallel sum : 499999999500000000, elapsed 102833 ms
> single thread sum : 499999999500000000, elapsed 1667 ms
>
> The parallel version is 60+ times slower

Reducing the number of threads is key. However, unlike what others said,parallel() does not use that many threads. By default, TaskPool objectsare constructed by 'totalCPUs - 1' worker threads. All of parallel()'siteration are executed on that few threads.

The main problem here is the use of atomicOp, which necessarilysynchronizes the whole process.

Something like the following takes advantage of parallelism and reducesthe execution time by half on my machine (4 cores (hyperthreaded 2 actulones)).


    ulong adder(ulong beg, ulong end)
    {
        ulong localSum = 0;

        foreach (i; beg .. end) {
            localSum += i;
        }

        return localSum;
    }

    enum totalTasks = 10;

    foreach(i; parallel(iota(0, totalTasks)))
    {
        ulong beg = i * iter / totalTasks;
        ulong end = beg + iter / totalTasks;

        atomicOp!"+="(sum, adder(beg, end));
    }

Ali

Re: std.parallelism curious results

Reply via email to