RE: Fwd:

Brad Chamberlain Thu, 15 May 2014 09:08:33 -0700

Hi Herve --

> * We've been working with Chapel 1.8.0 since the last version was not 
> available yet when we started. Except when we tried to profile the C-code, we 
> alsways used the --fast flag, initialy we did not and we saw the difference 
> in term of performances.


Thoroughly understandable.  Switching to 1.9.0, I expect that you will see 
some performance improvements, though there is still much room for 
improvement.  I'm glad you were always using the --fast flag -- the lack 
of it in the sample command lines in the email is what made me worry.

> I'll send the complete code tomorrow so that you can have a look when you 
> have time. We are using two 2D arrays to store the cell data, one for the n-1 
> iteration and one for the n iteration.

You know, in retrospect, I think I misinterpreted something.  I was 
thinking that dsiAccess3 implied a 3D array access, but given that you're 
using 2D arrays, I think it must mean that it's the third overload of a 
function of that name.  Sorry for the confusion on my part -- ZPL embedded 
the rank of the arrays into such function names, and I think I flashed 
back to that when looking at that stack.


> * After I sent the email the first time, I tried to suppress the domain 
> mapping and since I use for single-local execution:
> const physicalDomain: domain(2) = {1..pb.nb_cell_x, 1..pb.nb_cell_y};
> instead of
> const physicalDomain: domain(2) dmapped Block({1..pb.nb_cell_x, 
> 1..pb.nb_cell_y}) = {1..pb.nb_cell_x, 1..pb.nb_cell_y};
> and the performances we improved greatly.
> However, I'm also doing multi-local execution on a SGI Altix ICE 8200 server, 
> up to 16 nodes of 8 processors that's why I initialy used the Block 
> distribution by default in the code.

Makes sense, and I agree that ultimately, you will want to use the Block 
distribution.  That said, as you saw, the Block distribution incurs 
overheads that have not been optimized away yet, and due to the way we do 
unoptimized communication at present (very fine grain, very demand 
driven), stencil patterns are a particularly bad case in Chapel compared 
to a hand-coded MPI kernel.  This is what the miniMD/stencil9 work that I 
mentioned last summer was working on improving -- how to coarsen the 
communications, use ghost cells, etc.  We can provide more information on 
that work if you are interested.  As mentioned previously, it's something 
we plan/hope to spend more time on this year.


> * About the distribution used on x_domain and y_domain, I supposed that when 
> the domains are too small it could/would decrease the performances, but I did 
> it initialy because we planned to use large domains (32768x32768 cells) and I 
> wanted the arrays containing the boundary data to be treated in parallel.
> I also tried not to use block distribution for x_domain and y_domain before 
> but on this type of grid I did not see improvements.

To be clear, I do think you want the boundaries distributed, but I was 
just hypotheisizing that you would do better to distribute them relative 
to the physical space rather than independently.  I.e., by creating the 
boundaries relative to the physical space, they will still be distributed, 
but to the same locales that own the corresponding cells in the physical 
space (so, on a p x p locale grid, they'd be distributed over the p 
locales on one edge rather than all p x p locales).  Again, my thinking is 
that it's better to use a subset of the resources and align the data with 
the physical space that it correlates with to avoid communication than to 
use all the resources and require more communication (and more arbitrary 
communication) between the physical space to boundaries.  And again, this 
is predicated on an assumption that there's asymptotically less work going 
on at the boundaries and therefore you can tolerate using only a subset of 
the locales (esp. since other boundaries could be computed simultaneously, 
allowing you to use something like 4p locales in parallel rather than 
serializing the boundary computations).

The one other thing motivating this suggestion is that Chapel has been 
designed such that if a number of domains share the same domain map (as 
these would), it gives the compiler more semantic information about the 
relative alignment (and therefore, lack of need for communication) than if 
every domain has a different domain map.  That said, this is a 
forward-looking characterization in that we haven't implemented it yet in 
Chapel (or, you can consider it backward-looking, as it is what we 
implemented in ZPL).

-Brad


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

RE: Fwd:

Reply via email to