Performance observations for vs forall vs reduce

Peter Kjellström Wed, 04 Dec 2013 06:09:19 -0800

Hello,

I'm new to this list (and kind of new to chapel too).


I read an article comparing a bunch of different languages when doing a 
trivial pi approximation loop at http://scalability.org/?p=6559 and decided to 
see how chapel did.

So I wrote the following three programs (from now on refered to as for, forall 
and reduce):

 // For, using a simple for loop
 config const N = 1000000000;
 var i = 0;
 var sum = 0.0;
 for i in 1..N by -1 do sum += 1.0/(i*i);
 writeln(sqrt(sum*6));

 // Forall, same as above but with forall
 config const N = 1000000000;
 var i = 0;
 var sum = 0.0;
 forall i in 1..N by -1 do sum += 1.0/(i*i);
 writeln(sqrt(sum*6));

 // Reduce, instead using chapels reduce directly
 config const N = 1000000000;
 var i = 0;
 var sum = + reduce [ i in 1..N by -1 ] 1.0/(i*i);
 writeln(sqrt(sum*6));

On my dual core laptop (GCC-4.7.2) I first ran the C-program from the URL 
above as reference (asci art / org-mode tables):

|---+---------+------|
|   | Default | -O3  |
|---+---------+------|
| C | 12s     | 7.8s |
|---+---------+------|

The I compiled and ran my Chapel programs (1.8.0 build with above GCC) both 
unoptimized vs --fast and using both cores vs using one core:

|---------------------+---------+--------|
|                     | Default | --fast |
|---------------------+---------+--------|
| For                 | 15s     | 7.8s   |
|---------------------+---------+--------|
| Forall              | ~50s    | 17s    |
|---------------------+---------+--------|
| Forall(single core) | ~80s    | 10s    |
|---------------------+---------+--------|
| Reduce              | ~50s    | 5.7s   |
|---------------------+---------+--------|
| Reduce(single core) | ~80s    | 10s    |
|---------------------+---------+--------|

Observations:

* Wow, normal serial for loop with optimization is as fast as the C version (I 
  did not exptect that)
* Parallel variants are really slow without --fast
* Best is optimized parallel reduce with beats the C version (yes yes, unfair, 
  serial vs parallel I know...)
* Forall is just slow (not even the optimized version beats the unoptimized 
  for loop)

I don't really have a question unless you count "Is this behaviour expected?" 
:-)

Cheers,
 Peter

-- 
-= Peter Kjellström
-= National Supercomputer Centre

signature.asc
Description: This is a digitally signed message part.

------------------------------------------------------------------------------
Sponsored by Intel(R) XDK 
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk

_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Performance observations for vs forall vs reduce

Reply via email to