Hi Brad and John,

>> Machines will have very different kinds of processors and
>> GPUs, so an ability to balance loads will be paramount.
>
> Chapel does not have any built-in load balancing capabilities across
> locales (nodes).  Tasks execute where they are placed, either explicitly
> via on-clauses or implicitly via parallel operations over distributed
> domains (index sets) or arrays.  To do dynamic load balancing, one would
> either need to explicitly write it by creating something like task queues
> and moving work around themselves using on-clauses, or by creating a
> domain map (distribution for a domain and its arrays) that managed the
> load balancing.  (Probably the domain map would be doing the same kind of
> task queueing management under the covers, so would just be abstracting it
> away from user code for things like distributed iterations or array
> operations.
>
>

I would like to try to contribute something to your issue:

The data structure "Locales" that organizes the used locales (nodes)  
is an array (or domain/range, I still don't get the difference ;-) )  
which can be mapped onto a specified distribution of workload for  
several nodes. Chapel already provides some pre-defined ones but it is  
possible to create your own distribution.

Doing this may be easier if you want to distribute the workload  
equally on your nodes. You surely want to put more workload onto such  
nodes which compute faster or are more appropriate to your program. To  
do this there is a possibility to create an array for an arbitrary  
number of locales while you distribute it accordingly. If this array  
is larger than your actually number of used nodes it will wrap the  
array entries on the available nodes.

Let me give you an example:
Let there be the nodes 0 and 1. The "Locales" array will be [0 1].
If you map your locales onto a larger array (it's somewhere in the  
language spec) you may get a result like [0 1 0 1 0 1 0 1]. Since you  
of course still have two existing nodes the array entries will be put  
onto their respective nodes, dividing the workload like this.
My idea is that instead of distributing them like this, you create  
such an extended array with giving more entries to faster nodes.
Let node 1 be faster than node 0 it can be e.g. [0 1 1 0 1 1 0 1 1]
So for every one chunk of your workload for node 0 it will put two  
chunks into node 1.

As I said, it's just an idea. If all computed data are dependent on  
each other (as it is in my Chapel project) it may become a bit more  
complicated. But maybe it's a bit helpful.

By the way it generally takes some time to become familiar with  
languages like Chapel depending on your experience with  
APGAS-languages. I'm writing my bachelors' thesis about this concept  
and never heard about this before, so in the beginning I had to start  
cluelessly implementing my program in Chapel, hoping that it will work  
somehow and meanwhile only the distribution part is missing yet. By  
now I find the Chapel team very trustworthy to answer any questions  
about this, so I'm sure that you too will be content. :-)

bye
Michael


------------------------------------------------------------------------------
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to