Re: Performance observations for vs forall vs reduce

Brad Chamberlain Wed, 04 Dec 2013 14:10:51 -0800


Hi Peter --

I agree with Michael's comments. To understand your results better, Iwanted to ask how you are controlling running on one core vs. two cores.


Thanks,
-Brad


On Wed, 4 Dec 2013, Michael Ferguson wrote:

Hi Peter -

Thanks for sharing your experience and analysis. I would
like to point out two things.

First, the declaration
 var i = 0;
is not necessary in any of your tiny programs since
for/forall will introduce i as a variable within its scope.

Second, your forall loop does not compute the same number
as your for loop. I'm including it here for completeness:

   // Forall, same as above but with forall
   config const N = 1000000000;
   var i = 0;
   var sum = 0.0;
   forall i in 1..N by -1 do sum += 1.0/(i*i);
   writeln(sqrt(sum*6));

There is a race condition.
The problem is that the forall loop runs the iterations
in some number of tasks, so that it is likely that at some point
the updates to sum happen in two threads at once. The visible effect
would be that you'd "lose updates" since you'd see something
like:

 Thread A           |  Thread B
                    |
 registerA = sum    |
                    | registerB = sum
 registerA += xA    |
                    | registerB += xB
 sum = registerA    |
                    | sum = registerB

so that at the end you have
 sum = sum + xA
when you wanted
 sum = sum + xA + xB.


The best way to solve this problem is to use + reduce,
but other options include making sum an atomic or sync variable.

Cheers,

-michael


On 12/04/2013 08:41 AM, Peter Kjellström wrote:

Hello,

I'm new to this list (and kind of new to chapel too).

I read an article comparing a bunch of different languages when doing a
trivial pi approximation loop at http://scalability.org/?p=6559 and decided to
see how chapel did.

So I wrote the following three programs (from now on refered to as for, forall
and reduce):

  // For, using a simple for loop
  config const N = 1000000000;
  var i = 0;
  var sum = 0.0;
  for i in 1..N by -1 do sum += 1.0/(i*i);
  writeln(sqrt(sum*6));

  // Forall, same as above but with forall
  config const N = 1000000000;
  var i = 0;
  var sum = 0.0;
  forall i in 1..N by -1 do sum += 1.0/(i*i);
  writeln(sqrt(sum*6));

  // Reduce, instead using chapels reduce directly
  config const N = 1000000000;
  var i = 0;
  var sum = + reduce [ i in 1..N by -1 ] 1.0/(i*i);
  writeln(sqrt(sum*6));

On my dual core laptop (GCC-4.7.2) I first ran the C-program from the URL
above as reference (asci art / org-mode tables):

|---+---------+------|
|   | Default | -O3  |
|---+---------+------|
| C | 12s     | 7.8s |
|---+---------+------|

The I compiled and ran my Chapel programs (1.8.0 build with above GCC) both
unoptimized vs --fast and using both cores vs using one core:

|---------------------+---------+--------|
|                     | Default | --fast |
|---------------------+---------+--------|
| For                 | 15s     | 7.8s   |
|---------------------+---------+--------|
| Forall              | ~50s    | 17s    |
|---------------------+---------+--------|
| Forall(single core) | ~80s    | 10s    |
|---------------------+---------+--------|
| Reduce              | ~50s    | 5.7s   |
|---------------------+---------+--------|
| Reduce(single core) | ~80s    | 10s    |
|---------------------+---------+--------|

Observations:

* Wow, normal serial for loop with optimization is as fast as the C version (I
   did not exptect that)
* Parallel variants are really slow without --fast
* Best is optimized parallel reduce with beats the C version (yes yes, unfair,
   serial vs parallel I know...)
* Forall is just slow (not even the optimized version beats the unoptimized
   for loop)

I don't really have a question unless you count "Is this behaviour expected?"
:-)

Cheers,
  Peter



------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

------------------------------------------------------------------------------
Sponsored by Intel(R) XDK 
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk

_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Re: Performance observations for vs forall vs reduce

Reply via email to