Hi Praveen --

I think the MPI code also has lot of overheads since it has to transfer data b/w processess which the Chapel code does not have to do.

Yes, sorry, I didn't mean to imply that all the disadvantages were Chapel's, but merely to point out some differences between the two codes that would cause them not to be equivalent in their approaches.

I also have the same halo cells in MPI code as in the Chapel code. In the MPI code, each process copies data from a global vector to a local vector, then does the actual computations, which the Chapel code doesnt do. Hence I expected the MPI code to do worse.

The trouble is that the StencilDist adds overheads to the shared-memory case because it is written in a way that assumes it's going to be run in distributed memory mode (example: When randomly accessing an array, it does a check to see "is this element remote or local?" that would not be necessary in a single-locale environment or an MPI program). With additional effort, the StencilDist distribution could be optimized to reduce or eliminate overhead in single-locale runs, but that isn't an effort we've made since it's not a common case. Hence the approaches like the one I pointed to in lulesh to optimize it out manually for single-locale runs).

I was wondering if my Chapel code is not well written. E.g., there are loops like this

forall (i,j) in Dx
// do some computation
res[i-1,j] += flux * dy;
res[i,j] -= flux * dy;

Do I have to worry about different threads writing into same location of the "res” variable ?

Yes, you do (assuming that Dx contains adjacent elements in dimension 1). Specifically, the use of the forall loop says that the distinct loop iterations are safe to run in parallel with one another, but if one iteration were doing the += line on a given element while an adjacent iteration were doing the -= line for the same element, that could lead to a race condition.

How can I check how much time is spent in different parts of the Chapel code ?

Take a look at this primer for a way to do it by inserting timers into the code:


Another option would be to use chplvis:




On 10-Oct-2016, at 10:49 PM, Brad Chamberlain <br...@cray.com> wrote:

Hi Praveen --

In addition to Jeff's good advice on timing the computation you care about, I wanted to point out a difference between the MPI and the Chapel code:

As you know, MPI is designed to be a distributed memory execution model, so to take advantage of the four cores on your Mac, you use mpirun -np 4.

Chapel supports both shared- and distributed-memory parallelism, so the way you're running on this 4-core Mac is reasonable, yet different than the MPI. Specifically, we will create a single process that will use multiple threads to implement your forall loops (typically 4). So there will be no inter-process communication in the Chapel implementation as there is in the MPI version and comparing against an OpenMP implementation would be a more fair comparison.

Related: The use of the 'StencilDist' domain map has no positive impact for a shared-memory execution like this, and will likely add overhead. It is designed for use on distributed-memory executions that do stencil-based computations in order to enable caching of values owned by neighboring processes. But when you've only got one process like this, there's no remote data to cache. So for a shared-memory execution like this, it'd be interesting to see how much faster the code would be if the 'dmapped StencilDist' clause was commented out (in practice, we often write codes that can be compiled with or without distributed data using a 'param' conditional -- for example, see the declarations of 'Elems' and 'Nodes' in examples/benchmarks/lulesh.chpl).

Running on a distributed memory system using the 'StencilDist' distribution against MPI (or better, vs. an MPI + OpenMP code) would also be more of an apples-to-apples comparison, though I suspect you'll see Chapel fall further behind in terms of performance at that point...


Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
Chapel-users mailing list

Reply via email to