Hi Brad and John, >> Machines will have very different kinds of processors and >> GPUs, so an ability to balance loads will be paramount. > > Chapel does not have any built-in load balancing capabilities across > locales (nodes). Tasks execute where they are placed, either explicitly > via on-clauses or implicitly via parallel operations over distributed > domains (index sets) or arrays. To do dynamic load balancing, one would > either need to explicitly write it by creating something like task queues > and moving work around themselves using on-clauses, or by creating a > domain map (distribution for a domain and its arrays) that managed the > load balancing. (Probably the domain map would be doing the same kind of > task queueing management under the covers, so would just be abstracting it > away from user code for things like distributed iterations or array > operations. > >
I would like to try to contribute something to your issue: The data structure "Locales" that organizes the used locales (nodes) is an array (or domain/range, I still don't get the difference ;-) ) which can be mapped onto a specified distribution of workload for several nodes. Chapel already provides some pre-defined ones but it is possible to create your own distribution. Doing this may be easier if you want to distribute the workload equally on your nodes. You surely want to put more workload onto such nodes which compute faster or are more appropriate to your program. To do this there is a possibility to create an array for an arbitrary number of locales while you distribute it accordingly. If this array is larger than your actually number of used nodes it will wrap the array entries on the available nodes. Let me give you an example: Let there be the nodes 0 and 1. The "Locales" array will be [0 1]. If you map your locales onto a larger array (it's somewhere in the language spec) you may get a result like [0 1 0 1 0 1 0 1]. Since you of course still have two existing nodes the array entries will be put onto their respective nodes, dividing the workload like this. My idea is that instead of distributing them like this, you create such an extended array with giving more entries to faster nodes. Let node 1 be faster than node 0 it can be e.g. [0 1 1 0 1 1 0 1 1] So for every one chunk of your workload for node 0 it will put two chunks into node 1. As I said, it's just an idea. If all computed data are dependent on each other (as it is in my Chapel project) it may become a bit more complicated. But maybe it's a bit helpful. By the way it generally takes some time to become familiar with languages like Chapel depending on your experience with APGAS-languages. I'm writing my bachelors' thesis about this concept and never heard about this before, so in the beginning I had to start cluelessly implementing my program in Chapel, hoping that it will work somehow and meanwhile only the distribution part is missing yet. By now I find the Chapel team very trustworthy to answer any questions about this, so I'm sure that you too will be content. :-) bye Michael ------------------------------------------------------------------------------ _______________________________________________ Chapel-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-users
