Hi Herve --
> * We've been working with Chapel 1.8.0 since the last version was not
> available yet when we started. Except when we tried to profile the C-code, we
> alsways used the --fast flag, initialy we did not and we saw the difference
> in term of performances.
Thoroughly understandable. Switching to 1.9.0, I expect that you will see
some performance improvements, though there is still much room for
improvement. I'm glad you were always using the --fast flag -- the lack
of it in the sample command lines in the email is what made me worry.
> I'll send the complete code tomorrow so that you can have a look when you
> have time. We are using two 2D arrays to store the cell data, one for the n-1
> iteration and one for the n iteration.
You know, in retrospect, I think I misinterpreted something. I was
thinking that dsiAccess3 implied a 3D array access, but given that you're
using 2D arrays, I think it must mean that it's the third overload of a
function of that name. Sorry for the confusion on my part -- ZPL embedded
the rank of the arrays into such function names, and I think I flashed
back to that when looking at that stack.
> * After I sent the email the first time, I tried to suppress the domain
> mapping and since I use for single-local execution:
> const physicalDomain: domain(2) = {1..pb.nb_cell_x, 1..pb.nb_cell_y};
> instead of
> const physicalDomain: domain(2) dmapped Block({1..pb.nb_cell_x,
> 1..pb.nb_cell_y}) = {1..pb.nb_cell_x, 1..pb.nb_cell_y};
> and the performances we improved greatly.
> However, I'm also doing multi-local execution on a SGI Altix ICE 8200 server,
> up to 16 nodes of 8 processors that's why I initialy used the Block
> distribution by default in the code.
Makes sense, and I agree that ultimately, you will want to use the Block
distribution. That said, as you saw, the Block distribution incurs
overheads that have not been optimized away yet, and due to the way we do
unoptimized communication at present (very fine grain, very demand
driven), stencil patterns are a particularly bad case in Chapel compared
to a hand-coded MPI kernel. This is what the miniMD/stencil9 work that I
mentioned last summer was working on improving -- how to coarsen the
communications, use ghost cells, etc. We can provide more information on
that work if you are interested. As mentioned previously, it's something
we plan/hope to spend more time on this year.
> * About the distribution used on x_domain and y_domain, I supposed that when
> the domains are too small it could/would decrease the performances, but I did
> it initialy because we planned to use large domains (32768x32768 cells) and I
> wanted the arrays containing the boundary data to be treated in parallel.
> I also tried not to use block distribution for x_domain and y_domain before
> but on this type of grid I did not see improvements.
To be clear, I do think you want the boundaries distributed, but I was
just hypotheisizing that you would do better to distribute them relative
to the physical space rather than independently. I.e., by creating the
boundaries relative to the physical space, they will still be distributed,
but to the same locales that own the corresponding cells in the physical
space (so, on a p x p locale grid, they'd be distributed over the p
locales on one edge rather than all p x p locales). Again, my thinking is
that it's better to use a subset of the resources and align the data with
the physical space that it correlates with to avoid communication than to
use all the resources and require more communication (and more arbitrary
communication) between the physical space to boundaries. And again, this
is predicated on an assumption that there's asymptotically less work going
on at the boundaries and therefore you can tolerate using only a subset of
the locales (esp. since other boundaries could be computed simultaneously,
allowing you to use something like 4p locales in parallel rather than
serializing the boundary computations).
The one other thing motivating this suggestion is that Chapel has been
designed such that if a number of domains share the same domain map (as
these would), it gives the compiler more semantic information about the
relative alignment (and therefore, lack of need for communication) than if
every domain has a different domain map. That said, this is a
forward-looking characterization in that we haven't implemented it yet in
Chapel (or, you can consider it backward-looking, as it is what we
implemented in ZPL).
-Brad
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users