Hi, >> If I decided to go with Chapel, how would you recommend me to do load >> balancing with existing tools when using rectangular grids? > I'm afraid I'm not familiar enough with your application area to know > precisely what to suggest. One simple model that we supported in ZPL but > have not yet taken to Chapel (that I recall) is the notion of a "cut" > distribution which is almost identical to the block distribution except > that rather than evenly dividing a bounding box across the locales, it > takes per-dimension hyperplanes and makes the cuts there instead of > evenly. In a 1D example, distributing 1..12 across 2 locales, you'd get > 1..6 and 7..12 naively. With a cut distribution, you might say "my first > locale is 2x as capable as my second, so let's make the cut at 8 (or 7.5? > :) which would result in an allocation of 1..8 and 9..12, respectively. > If this type of statically skewed load balancing was sufficient for your > work, I think the approach would probably be to take the Block > distribution, modify its constructor to take the cut hyperplanes (rather > than a bounding box), change the logic used by the locales to determine > what they own, change the logic used to determine who owns index 'i', and > it very well may be that the rest would fall out. If you're interested in > pursuing this, let me know and I can provide a few more pointers in the > code specifically. (though I should also warn that with SC14 coming up, > I'm not likely to be the best at responding this month). >> Would copying Block dist and making such changes be very difficult? >> Basically, the kind of distribution I was thinking of would be very >> similar to Block dist. Differences would be that the data would be >> partitioned only along one axis, the longest axis. That would make it >> easier to assign different sized blocks to locales. The constructor >> would take a map as an argument which describes (relative) number of >> elements assigned to each locale. Later I'd also like to add >> functionality to change distribution during execution by passing a >> similar map. That way I could do load balancing at some other place, >> assuming I have a way to time the executions on each locale there. > I've been responding to this as I read, so apologize for not having the > complete picture in my notes above. Yes, I think that this is essentially > a similar concept to the cut distribution I mentioned above and I think > that modifying block to do this ought to be fairly tractable (at least, I > can't think of any major problems and believe it should be a modest amount > of work; simply in a big chunk of code that's not very user-friendly. > But I'd be happy to help guide you through the work and point to the main > places I can think of that would require changes).
Well, actually looking at the code of block dist almost managed to scare me away from this task... But if you would be willing to help me with getting started, then sure I could give it a try :) .I didn't quite understand how the cut distribution works, do you have any papers that would describe this? Especially the hyperplanes confused me; my understanding of geometry is pretty much limited to three dimensions... >> In earlier >> post you said that could be done implicitly by distributions of arrays, >> which would require custom domain map, or explicitly via on clauses. >> Wouldn't doing that with on clauses cause a lot of unnecessary data >> transfers between machines, or is there some clever mechanism that >> avoids that? > On-clauses do require active messages to be created between locales, so > you do want to try to minimize their use when possible. But when creating > something that load balances between locales, it usually isn't possible to > do it without communicating with those other locales somehow... I wasn't very clear about what I meant. Let's say that we have two locales, A and B, and a distributed array. Machine A is significantly faster than B. If we did load balancing using on-clauses, would every time A used data distributed to B happen this: data is transferred from B to A - A makes computations - data is transferred back to B. So is there some mechanism that A can cache the data owned by B, and only send it back if B actually needs it? > On the other hand, if these techniques work for you, it's easier than > writing new code. It's just not how I want Chapel to be used eventually > once we've beefed up our suite of distributions. >> I don't have enough knowledge to say anything about the efficiency >> issue. However, as an end-user I think that implementing custom domain >> map seems way too difficult. I think that either writing custom domain >> maps should be easier, > That's a fine opinion to hold, but personally I have no idea how to make > it easier without requiring O(n) storage (where n is the number of indices > in your domain) and/or making performance terrible (and we already get > enough complaints about that). The current approach is our answer to what > is arguably an open research problem. The papers might be interesting > reading if you haven't seen them (see the "Advanced Chapel Features" > section at http://chapel.cray.com/papers.html). There obviously may be > other better solutions, but I have no idea what they would be after > working on this problem for the past 20 years. I may just have worked > myself too far into a specific mental rut, though. One thing I'd find easy to use would be that constructor of distribution would take an object with at least following methods as an argument: - One that takes a locale as an argument and returns size of array (zero-based, densely indexed) that should be allocated to that locale - One that takes an index as an argument and returns which locale it belongs to. - One that takes an index as an argument and returns its position in the local array. Also ability to constrain implementation to a specific dimension would make things easier. Another similar solution would be to write an abstract class describing distributions of rectangular, densely-indexed grids, and having those aforementioned methods defined as abstract. Apart from those, generic default implementations could be provided for methods, or in some cases compile-time error could be generated. I don't know if these would be viable options in Chapel, though. But however, at least when it comes to user-defined distributions I don't think that the one size fits all -approach would work too well. Instead, providing specialized tools for most common situations would be better in my opinion, like ones I suggested above for rectangular, densely indexed grids. As far as I can see, this solution would provide a lot of flexibility, while still being rather easy to implement and not being too inefficient. ------------------------------------------------------------------------------ _______________________________________________ Chapel-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-users
