Hi John --

Sorry for the delay on this -- busy week.


> If I decided to go with Chapel, how would you recommend me to do load 
> balancing with existing tools when using rectangular grids?

I'm afraid I'm not familiar enough with your application area to know 
precisely what to suggest.  One simple model that we supported in ZPL but 
have not yet taken to Chapel (that I recall) is the notion of a "cut" 
distribution which is almost identical to the block distribution except 
that rather than evenly dividing a bounding box across the locales, it 
takes per-dimension hyperplanes and makes the cuts there instead of 
evenly. In a 1D example, distributing 1..12 across 2 locales, you'd get 
1..6 and 7..12 naively.  With a cut distribution, you might say "my first 
locale is 2x as capable as my second, so let's make the cut at 8 (or 7.5? 
:) which would result in an allocation of 1..8 and 9..12, respectively. 
If this type of statically skewed load balancing was sufficient for your 
work, I think the approach would probably be to take the Block 
distribution, modify its constructor to take the cut hyperplanes (rather 
than a bounding box), change the logic used by the locales to determine 
what they own, change the logic used to determine who owns index 'i', and 
it very well may be that the rest would fall out.  If you're interested in 
pursuing this, let me know and I can provide a few more pointers in the 
code specifically.  (though I should also warn that with SC14 coming up, 
I'm not likely to be the best at responding this month).

> In earlier 
> post you said that could be done implicitly by distributions of arrays, 
> which would require custom domain map, or explicitly via on clauses. 
> Wouldn't doing that with on clauses cause a lot of unnecessary data 
> transfers between machines, or is there some clever mechanism that 
> avoids that?

On-clauses do require active messages to be created between locales, so 
you do want to try to minimize their use when possible.  But when creating 
something that load balances between locales, it usually isn't possible to 
do it without communicating with those other locales somehow...

> Also, would spawning multiple locales onto those more powerful machines 
> be a viable option?

There are two approaches one could take here.  One would be to put 
multiple locales onto a node as you suggest (which would be done with the 
gasnet communication layer by listing a single node more often, I 
believe).  The other way would be to take the approach that Michael 
suggested in this thread where you still run #numNodes locales, but then 
you pass in a targetLocales array to the block distribution which contains 
the same locale values multiple times.

I'm not a huge fan of either of these approaches, personally.  The first 
has the downside that it's treating two processes running on a single node 
as if they were running further away, so you're introducing overheads and 
over-decomposing.  The second approach has a similar effect, just in the 
Chapel layer rather than the OS/runtime layer (I also have a vague memory 
of someone running into a concern in the distributions if a single locale 
appears multiple times -- perhaps a bug, perhaps just a performance issue 
-- but I can't recall the details just now.  Perhaps someone else will).

On the other hand, if these techniques work for you, it's easier than 
writing new code.  It's just not how I want Chapel to be used eventually 
once we've beefed up our suite of distributions.


> I don't have enough knowledge to say anything about the efficiency 
> issue. However, as an end-user I think that implementing custom domain 
> map seems way too difficult. I think that either writing custom domain 
> maps should be easier,

That's a fine opinion to hold, but personally I have no idea how to make 
it easier without requiring O(n) storage (where n is the number of indices 
in your domain) and/or making performance terrible (and we already get 
enough complaints about that).  The current approach is our answer to what 
is arguably an open research problem.  The papers might be interesting 
reading if you haven't seen them (see the "Advanced Chapel Features" 
section at http://chapel.cray.com/papers.html).  There obviously may be 
other better solutions, but I have no idea what they would be after 
working on this problem for the past 20 years.  I may just have worked 
myself too far into a specific mental rut, though.


> or there should be much more and more flexible 
> already implemented maps included, however I understand well that this 
> language is still work in progress.

Having a much larger suite of domain maps is definitely the plan.  I think 
the lack of a richer suite primarily reflects that we haven't had a chance 
to work on problems that require dynamic load balancing to date (nor on 
systems with heterogeneous node types).


> Would you expand this a bit, what do you mean by those key data 
> structures? What would those be in block dist, for example? I looked at 
> README.dsi on docs/technical, is document that up to date?

I think that document is mostly up-to-date, though many have cited it as 
being pretty difficult to read (I haven't read it myself).  I'd suggest 
starting with the papers I pointed to above (or the slides that accompany 
them) as I think they provide the right framework to start thinking from.


> Would copying Block dist and making such changes be very difficult? 
> Basically, the kind of distribution I was thinking of would be very 
> similar to Block dist. Differences would be that the data would be 
> partitioned only along one axis, the longest axis. That would make it 
> easier to assign different sized blocks to locales. The constructor 
> would take a map as an argument which describes (relative) number of 
> elements assigned to each locale. Later I'd also like to add 
> functionality to change distribution during execution by passing a 
> similar map. That way I could do load balancing at some other place, 
> assuming I have a way to time the executions on each locale there.

I've been responding to this as I read, so apologize for not having the 
complete picture in my notes above.  Yes, I think that this is essentially 
a similar concept to the cut distribution I mentioned above and I think 
that modifying block to do this ought to be fairly tractable (at least, I 
can't think of any major problems and believe it should be a modest amount 
of work; simply in a big chunk of code that's not very user-friendly. 
But I'd be happy to help guide you through the work and point to the main 
places I can think of that would require changes).

-Brad


------------------------------------------------------------------------------
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to