01-Feb-2013 19:42, Sparsh Mittal пишет:
It got posted before I completed it! Sorry.
I am parallelizing a program which follows this structure:
immutable int numberOfThreads= 2
for iter = 1 to MAX_ITERATION
{
myLocalBarrier = new Barrier(numberOfThreads+1);
for i= 1 to numberOfThreads
{
spawn(&myFunc, args)
}
myLocalBarrier.wait();
}
void myFunc(args)
{
//do the task
myLocalBarrier.wait()
}
When I run it, and compare this parallel version with its serial
version, I only get speedup of nearly <1.3 for 2 threads. When I write
same program in Go, scaling is nearly 2.
Also, in D, on doing "top", I see the usage as only 130% CPU and not
nearly 200% or 180%. So I was wondering, if I am doing it properly.
Please help me.
Can't tell much without the whole source or at least compilable
standalone piece. The '//do task part' is critical to understanding as
well as declaration of myLocalBarrier.
Also why not use std.parallelism?
--
Dmitry Olshansky