Re: Variable Block Distributions

Brad Chamberlain Fri, 06 Feb 2015 11:36:13 -0800

Hi John --

> Just to make sure: the language spec says that iterator and loop body 
> are executed in interleaved manner, so does that mean the these() method 
> (leader iterator) in BlockDom returns only after the loop bodies are 
> actually executed? Ie. timing body of coforall loop in these() will 
> measure how long forall loop takes to execute on each locale.


The language spec has long been behind w.r.t. the implementation of forall 
loops, so feel free not to hesitate to send such questions to the mailing 
list here.

In the current copy of master, forall loops get translated into either:

a) standalone parallel iterators (a new feature in 1.11), or
b) leader-follower iterators

Case (a) is used when a standalone parallel iterator is available and the 
forall loop is not a zippered iteration.  Case (b) is used for zippered 
forall loops, or in the case that a standalone parallel iterator is not 
available.  We're currently in the process of implementing standalone 
iterators for most of our domain maps (that's how new they are).

In either case, for typical loop structures, you should think of the 
result of the forall as being rewritten into something like:

        coforall loc in targetLocales {         // create per-locale task
          on loc do {                           // move task to locale
            coforall tid in ... {               // create local tasks
              for i in ... {                    // do task-local work
                ...body of forall loop...
              }
            }
          }
        }

where the coforalls and ons are typically in the leader/standalone 
iterator, the inner for is in the follower/standalone iterator, and the 
body is the body.

The semantics of a coforall guarantees that the task which entered it 
won't complete until all of their iterations have completed, thus, it's 
correct that the leader/standalone iterator will not return until after 
all the loop bodies are executed (and this property is required in order 
to implement forall semantics properly).


> In one of the previous post Brad described a cut distribution that 
> existed in ZPL. Do you know of any papers about that, or papers about 
> any other efficient ways of writing efficient, variable sized 
> distributions? My implementation isn't quite as efficient as I'd like, 
> though I'll see what kind of improvement some caching will bring...

I don't know that there was anything particularly efficient about the cut 
distribution in ZPL that you're missing here.  Chapel, in its current 
form, is known to result in suboptimal performance in many cases, 
particularly distributed memory runs (see $CHPL_HOME/PERFORMANCE).  For 
what you're undertaking, I think the big question would be whether your 
distribution is significantly underperfoming Block when you use it to 
distribute things evenly.  If so, that suggests that there's more that 
could be done to optimize it; if not, it suggests you're running into the 
Chapel status quo.

In particular, ZPL was much more competitive with MPI for many interesting 
benchmarks (like the NPB) than Chapel has ever been; but that 
competitiveness came with a lot of restrictions -- no user-defined 
distributions, no task-parallelism or nested parallelism, no OOP, few 
modern language conveniences.  You can think of Chapel as being on a path 
to reproduce the performance successes of ZPL in a much more 
general/extensible language so that it can be practically adopted rather 
than an academic curiosity.  Each of the last several releases have 
contained significant improvements to performance, and we expect that 
trend to continue.

The main paper about cut distributions in ZPL that I'm aware of was Steve 
Deitz's thesis, available here:

http://research.cs.washington.edu/zpl/papers/data/Deitz05Thesis.pdf

though from memory, I'm skeptical that it will contain any silver bullets 
that would help with optimizing a similar distribution in Chapel.

-Brad


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Re: Variable Block Distributions

Reply via email to