Re: How to compare Chapel versus MPI

Brad Chamberlain Wed, 12 Oct 2016 16:27:29 -0700


Hi Praveen --

I think the MPI code also has lot of overheads since it has to transferdata b/w processess which the Chapel code does not have to do.

Yes, sorry, I didn't mean to imply that all the disadvantages wereChapel's, but merely to point out some differences between the two codesthat would cause them not to be equivalent in their approaches.

I also have the same halo cells in MPI code as in the Chapel code. Inthe MPI code, each process copies data from a global vector to a localvector, then does the actual computations, which the Chapel code doesntdo. Hence I expected the MPI code to do worse.

The trouble is that the StencilDist adds overheads to the shared-memorycase because it is written in a way that assumes it's going to be run indistributed memory mode (example: When randomly accessing an array, itdoes a check to see "is this element remote or local?" that would not benecessary in a single-locale environment or an MPI program). Withadditional effort, the StencilDist distribution could be optimized toreduce or eliminate overhead in single-locale runs, but that isn't aneffort we've made since it's not a common case. Hence the approaches likethe one I pointed to in lulesh to optimize it out manually forsingle-locale runs).

I was wondering if my Chapel code is not well written. E.g., there areloops like this
forall (i,j) in Dx
{
// do some computation
res[i-1,j] += flux * dy;
res[i,j] -= flux * dy;
}
Do I have to worry about different threads writing into same location ofthe "res” variable ?

Yes, you do (assuming that Dx contains adjacent elements in dimension 1).Specifically, the use of the forall loop says that the distinct loopiterations are safe to run in parallel with one another, but if oneiteration were doing the += line on a given element while an adjacentiteration were doing the -= line for the same element, that could lead toa race condition.

How can I check how much time is spent in different parts of the Chapelcode ?

Take a look at this primer for a way to do it by inserting timers into thecode:


        http://chapel.cray.com/docs/latest/primers/primers/timers.html

Another option would be to use chplvis:

        http://chapel.cray.com/docs/latest/tools/chplvis/chplvis.html

Best,
-Brad

Best
praveen
On 10-Oct-2016, at 10:49 PM, Brad Chamberlain <[email protected]> wrote:


Hi Praveen --
In addition to Jeff's good advice on timing the computation you careabout, I wanted to point out a difference between the MPI and theChapel code:
As you know, MPI is designed to be a distributed memory executionmodel, so to take advantage of the four cores on your Mac, you usempirun -np 4.
Chapel supports both shared- and distributed-memory parallelism, so theway you're running on this 4-core Mac is reasonable, yet different thanthe MPI. Specifically, we will create a single process that will usemultiple threads to implement your forall loops (typically 4). Sothere will be no inter-process communication in the Chapelimplementation as there is in the MPI version and comparing against anOpenMP implementation would be a more fair comparison.
Related: The use of the 'StencilDist' domain map has no positive impactfor a shared-memory execution like this, and will likely add overhead.It is designed for use on distributed-memory executions that dostencil-based computations in order to enable caching of values ownedby neighboring processes. But when you've only got one process likethis, there's no remote data to cache. So for a shared-memoryexecution like this, it'd be interesting to see how much faster thecode would be if the 'dmapped StencilDist' clause was commented out (inpractice, we often write codes that can be compiled with or withoutdistributed data using a 'param' conditional -- for example, see thedeclarations of 'Elems' and 'Nodes' inexamples/benchmarks/lulesh.chpl).
Running on a distributed memory system using the 'StencilDist'distribution against MPI (or better, vs. an MPI + OpenMP code) wouldalso be more of an apples-to-apples comparison, though I suspect you'llsee Chapel fall further behind in terms of performance at that point...
-Brad

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Re: How to compare Chapel versus MPI

Reply via email to