It got posted before I completed it! Sorry.

I am parallelizing a program which follows this structure:

immutable int numberOfThreads= 2

for iter = 1 to MAX_ITERATION
{
     myLocalBarrier = new Barrier(numberOfThreads+1);
     for i= 1 to numberOfThreads
      {
        spawn(&myFunc, args)
      }
      myLocalBarrier.wait();

}

void myFunc(args)
{
     //do the task

       myLocalBarrier.wait()
}

When I run it, and compare this parallel version with its serial version, I only get speedup of nearly <1.3 for 2 threads. When I write same program in Go, scaling is nearly 2.

Also, in D, on doing "top", I see the usage as only 130% CPU and not nearly 200% or 180%. So I was wondering, if I am doing it properly. Please help me.

Reply via email to