Hi Michael,

I do believe it is useful to look for a more succinct solution.  This can
provide both clarity and maintainability.  I propose the following:

use Time;

var time: Timer;
const intervals = 1000000000;
const delta = 1.0/intervals;
const n = here.numCores;

var inter: [ 1..n ] real;  // intermediate results

time.start();

forall i in 1..n
{
   for j in i..intervals by n
   {
     const x = (j-0.5)*delta;
     inter[i] += 4.0/(1.0+x**2); // Intermediate result
   }
}

const pi = + reduce inter;
// essentially:  for i in 1..n { pi += inter[i]; }

time.stop();

writeln("Pi: ",pi*delta);
writeln("Time elapsed: ",time.elapsed()," seconds");


A few comments:

I'm not debating how pi should be calculated (just answering the question
that was asked, and happy to do so).

I prefer to use 'const' when there are invariants, as this helps with
clarity and may even help the compiler generate better code.

The intermediate array uses more space (though not significantly in this
case - but is always a trade-off that needs examination).

This approach eliminates the atomic but introduces a reduction after
the forall (not appropriate in many cases).

So, I merely present this example as a way to think about simplification,
clarity, and in this case - comparable performance.

Tom MacDonald

On Mon, 20 Jul 2015, Michael Dietrich wrote:

> Hi Chapel team,
>
> I just remembered an old homework about parallelizing a program in
> OpenMP and decided to re-implement it in Chapel.
> It is a simple iterative algorithm for calculating the number Pi [1].
> In serial it needed around 18.7 seconds on my machine.
> Obviously my first thought was about simply parallelizing the
> for-loop. In every iteration the final result is updated so its'
> variable and the update are atomic now [2]. However this solution
> seemed not to be useful since it didn't even terminate after some
> minutes. Maybe all these uses of the atomic operations need too much
> time
> So I changed the program in a way that the atomic operation needed to
> be used only once in the whole program [3] - and it worked: On a PC
> with four cores it needed only 3.6 seconds of time. However I think
> this implementation looks quite long-winded to me. Can you suggest a
> shorter solution in Chapel or is this considered appropriate in
> certain cases like this?
>
> bye
> Michael
>
> [1] https://www-user.tu-chemnitz.de/~michd/chpl_pi/eng/pi_serial.chpl
> [2] https://www-user.tu-chemnitz.de/~michd/chpl_pi/eng/pi_local1.chpl
> [3] https://www-user.tu-chemnitz.de/~michd/chpl_pi/eng/pi_local2.chpl
>
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Chapel-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/chapel-users
>

------------------------------------------------------------------------------
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to