01-Feb-2013 19:42, Sparsh Mittal пишет:
It got posted before I completed it! Sorry.


I am parallelizing a program which follows this structure:

immutable int numberOfThreads= 2

for iter = 1 to MAX_ITERATION
{
      myLocalBarrier = new Barrier(numberOfThreads+1);
      for i= 1 to numberOfThreads
       {
         spawn(&myFunc, args)
       }
       myLocalBarrier.wait();

}

void myFunc(args)
{
      //do the task

        myLocalBarrier.wait()
}

When I run it, and compare this parallel version with its serial
version, I only get speedup of nearly <1.3 for 2 threads. When I write
same program in Go, scaling is nearly 2.

Also, in D, on doing "top", I see the usage as only 130% CPU and not
nearly 200% or 180%. So I was wondering, if I am doing it properly.
Please help me.


Can't tell much without the whole source or at least compilable standalone piece. The '//do task part' is critical to understanding as well as declaration of myLocalBarrier.

Also why not use std.parallelism?
--
Dmitry Olshansky

Reply via email to