Hi, Thanks for your reply. I think I'll try the approach 2 you described in your post. I had a look at the sources of block dist and cyclic dist and investigated those routines you mentioned.
First about the Block.myChunk. Apparently the only place where this variable is needed apart from some printing functions is in Block.getChunk. The thing that caught my attention was following comment at that function in cyclic distribution: WANT: var distWhole = locDist(locid).myChunk.getIndices(); return inds((...distWhole)); However, as far as I can see the approach used in BlockDist is essentially same as the above. So, what are the limitations of the above way? The reason I'd like to know the limitations is because I haven't decided yet how generic distribution I'm trying to write. If I end up writing a distribution that could support arbitrary distributions of rectangular, densely indexed domains then the above limitation will be relevant. However, in more limited version where each locale is assigned one rectangular block I think that won't be a problem... Then, about the targetLocales array. What's the rationale of having it have the same rank as the distribution? While I can see the sense in case of the block dist, I'm not sure how good the idea would be in my case. Is there any reason why the targetLocales array (and locDist and targetLocDom) couldn't always have a rank of 1 or an arbitrary rank decided by the policy object? John 23.01.2015, 02:22, "Brad Chamberlain" <[email protected]>: > Hi John -- > > Thanks for the response. I'm going to try and feed you just enough > information to get you started, so that if you don't get back to this > project, I haven't wasted too much time. But if you do get to it, you > should try to ask follow-up questions as necessary (and at that point, > _may_ want to slosh this thread over to chapel-developers -- it's > somewhere on the cusp between a user and developer topic...). > > There haven't been significant changes to the distribution interface since > we last talked. We've started the discussion I alluded to previously > regarding a different interpretations for slicing and reindexing, but I > think that's orthogonal to this discussion for the most part (though it > may ultimately simplify the interface required to implement a domain map). > > There are two different approaches we've discussed, both of which I would > personally approach by making modifications to a copy of the Block > distribution in modules/dists/BlockDist.chpl: > > ----- > > Approach 1: Specialize the Block distribution to support blocking in one > dimension only, but with varying block sizes across locales (as computed > heuristically, or by passing in "cut" values or ...). > > Approach 2: Create a variation of the Block distribution that gives a user > the ability to pass in a "policy" object that implements the simplified > interface you proposed: > - "how big should this locale's local block be?" > - "which locale owns this index?" > - "please access the local array using this index." > > ----- > > Either one of these approaches will involve similar components of > BlockDist, so let me try and highlight the salient bits. I apologize in > advance for the lack of commenting and ample opportunity for cleanup in > BlockDist.chpl. It was written during the hectic HPCS years of Chapel and > hasn't had an opportunity to be freshened up since then. Just putting > these notes together, I came across a number of things that look like they > could be trivially cleaned up/simplified... > > * I can't recall whether you've read the relevant papers or slides on > Chapel's distribution framework, but there are essentially six classes > involved that you can think of as being in a 3 x 2 grid: The 3-ary > dimension is "domain map", "domain", and "array". The other dimension > is global classes (those that represent the concept as a whole) and > local classes (those that represent a single locale's view of the > concept). These classes are named: > > Block BlockDom BlockArr > LocBlock LocBlockDom LocBlockArr > > respectively. If you need more background on their roles and > organization, please refer to the second and third papers (and/or their > slides) in the "Advanced Chapel Features" section of: > > http://chapel.cray.com/papers.html > > "Authoring User-Defined Domain Maps in Chapel" > "User-Defined Distributions and Layouts in Chapel: Philosophy and > Framework" > > * The LocBlock class is the crucial bit, I believe. This is the class > that specifies what indices the current locale owns for the Block > distribution. In the class, 'myChunk' is the crucial field, as it > describes the indices owned by the current locale (itself stored as > a non-distributed domain). This field is computed in the class's > constructor, LocBlock.LocBlock() (on line 497 of my current repo, but > your mileage may vary) using a utility function that we use to > compute blockings, _computeBlock(), defined in DSIUtil.chpl. My > suspicion is that if you redefine the logic to define 'myChunk' in > this constructor, much of the rest of the implementation will just > fall out. Specifically, when declaring domains and arrays over a > domain map, the determination of what each locale owns is typically > computed by looking at LocBlock.myChunk. > > * The other crucial bit is the Block.dsiIndexToLocale() routine which > maps an index in rank-space to a locale in the domain map's > 'targetLocales' array (indicating which locales the array should be > distributed between). As you'll see, this is implemented for Block > via a helper routine Block.targetLocsIdx(), which, if we've done our > math correctly, should be the inverse of _computeBlock() above > (begging the question why these are in different files, but see the > "hectic years" note above...). > > My hypothesis is that changing the above two things may be sufficient to > get simple programs that declare, access, and loop over distributed arrays > working, though it's also very possible that I'm missing some other key > pieces. Specifically... > > * When we declare a domain over the Block distribution, the > BlockDom.setup() routine computes the subdomain owned by each locale > using the Block.getChunk() helper routine which returns the locale's > subdomain by intersecting the whole domain with the domain stored by > the corresponding LocBlock class. This is then stored in the > LocBlockDom class's 'myBlock' field. > > * From there, the LocBlockArr class stores the local array of elements > in the 'myElems' array which in turn is declared over the locDom.myBlock > domain (a reference to that same locale's LocBlockDom object from the > previous bullet). > > Life being what it is, probably I'm missing some other key part... But I'm > pretty sure I'm not missing a dozen key parts, so I'll stop there for now > and hope that this is enough to get you going. What I'd suggest doing is > writing a simple program that declares a 1D block distributed domain and > array and initializes the array using the owning locale's ID, similar to > what's done in [test/release/]examples/primers/distributions.chpl. Then > I'd try replacing the computation of 'myChunk' and 'dsiIndexToLocale()' as > above, and re-compiling the program to see if it works. If it does, I'd > gradually move to more complicated programs to gain further confidence > that I didn't miss anything. If it doesn't, I'd follow up with more > questions. > > I'll be very curious what you're able to accomplish. And if you create a > generalization of the Block() distribution that would be more broadly of > interest to people and can contribute it back, that'd be great! > > Hope this is helpful, > -Brad > > On Thu, 22 Jan 2015, John MacFrenz wrote: >> Hi, >> >> I have been doing a lot of other projects lately, so I didn't remember >> to answer to this... >> >> However, I am still interested in this issue. At the moment I don't have >> excessive amounts of time to dedicate to this so my progress would be >> rather slow, though. >> >> So, I'd appreciate if you could give me some hints where to start and >> have there been any significant changes since we last discussed this? >> >> John >> >> 21.12.2014, 01:54, "Brad Chamberlain" <[email protected]>: >>> Hi John -- >>> >>> I'm trying to clean my inbox to wrap up the year, and am realizing that I >>> never replied to this last message on the thread for which I apologize >>> (you ran into the 1-2-punch that is SC and Thanksgiving back-to-back, and >>> then I never got back to older mails in December). >>> >>> Just to put in a quick note, I think that we could support the approach >>> that you're suggesting for a simpler distribution specification in a way >>> that doesn't throw away the generality that the current scheme has (and >>> that I think it must have). Specifically, the approach would be to write >>> a distribution that, itself could be parameterized by a class providing >>> the simpler interface that you want to specify. This would put all the >>> complexity into a framework distribution and you could just plug in the >>> details in the characterizing class. I'm 99% certain that this could be >>> done with no changes to the compiler/runtime/etc. -- it would just require >>> someone writing that framework code. >>> >>> Though not directly related to your question, I'll mention that we've also >>> started a converstion recently about a way to remove some of the more >>> complex routines from the distribution interface -- particularly the ones >>> related to slicing and reindexing. We'll see if that comes about in 2015 >>> or not. >>> >>> Anyway, I invite you to kick this thread back to life in early 2015 if >>> it's of interest to you still; and if not, I do apologize again for >>> letting it lapse this long. >>> >>> -Brad >>> >>> On Sat, 8 Nov 2014, John MacFrenz wrote: >>>> Hi, >>>>>> If I decided to go with Chapel, how would you recommend me to do load >>>>>> balancing with existing tools when using rectangular grids? >>>>> I'm afraid I'm not familiar enough with your application area to know >>>>> precisely what to suggest. One simple model that we supported in ZPL >>>>> but >>>>> have not yet taken to Chapel (that I recall) is the notion of a "cut" >>>>> distribution which is almost identical to the block distribution except >>>>> that rather than evenly dividing a bounding box across the locales, it >>>>> takes per-dimension hyperplanes and makes the cuts there instead of >>>>> evenly. In a 1D example, distributing 1..12 across 2 locales, you'd get >>>>> 1..6 and 7..12 naively. With a cut distribution, you might say "my >>>>> first >>>>> locale is 2x as capable as my second, so let's make the cut at 8 (or >>>>> 7.5? >>>>> :) which would result in an allocation of 1..8 and 9..12, respectively. >>>>> If this type of statically skewed load balancing was sufficient for >>>>> your >>>>> work, I think the approach would probably be to take the Block >>>>> distribution, modify its constructor to take the cut hyperplanes >>>>> (rather >>>>> than a bounding box), change the logic used by the locales to determine >>>>> what they own, change the logic used to determine who owns index 'i', >>>>> and >>>>> it very well may be that the rest would fall out. If you're >>>>> interested in >>>>> pursuing this, let me know and I can provide a few more pointers in the >>>>> code specifically. (though I should also warn that with SC14 coming >>>>> up, >>>>> I'm not likely to be the best at responding this month). >>>>>> Would copying Block dist and making such changes be very difficult? >>>>>> Basically, the kind of distribution I was thinking of would be very >>>>>> similar to Block dist. Differences would be that the data would be >>>>>> partitioned only along one axis, the longest axis. That would make it >>>>>> easier to assign different sized blocks to locales. The constructor >>>>>> would take a map as an argument which describes (relative) number of >>>>>> elements assigned to each locale. Later I'd also like to add >>>>>> functionality to change distribution during execution by passing a >>>>>> similar map. That way I could do load balancing at some other place, >>>>>> assuming I have a way to time the executions on each locale there. >>>>> I've been responding to this as I read, so apologize for not having the >>>>> complete picture in my notes above. Yes, I think that this is >>>>> essentially >>>>> a similar concept to the cut distribution I mentioned above and I think >>>>> that modifying block to do this ought to be fairly tractable (at >>>>> least, I >>>>> can't think of any major problems and believe it should be a modest >>>>> amount >>>>> of work; simply in a big chunk of code that's not very user-friendly. >>>>> But I'd be happy to help guide you through the work and point to the >>>>> main >>>>> places I can think of that would require changes). >>>> Well, actually looking at the code of block dist almost managed to scare >>>> me away from this task... But if you would be willing to help me with >>>> getting started, then sure I could give it a try :) .I didn't quite >>>> understand how the cut distribution works, do you have any papers that >>>> would describe this? Especially the hyperplanes confused me; my >>>> understanding of geometry is pretty much limited to three dimensions... >>>>>> In earlier >>>>>> post you said that could be done implicitly by distributions of >>>>>> arrays, >>>>>> which would require custom domain map, or explicitly via on clauses. >>>>>> Wouldn't doing that with on clauses cause a lot of unnecessary data >>>>>> transfers between machines, or is there some clever mechanism that >>>>>> avoids that? >>>>> On-clauses do require active messages to be created between locales, so >>>>> you do want to try to minimize their use when possible. But when >>>>> creating >>>>> something that load balances between locales, it usually isn't >>>>> possible to >>>>> do it without communicating with those other locales somehow... >>>> I wasn't very clear about what I meant. Let's say that we have two >>>> locales, A and B, and a distributed array. Machine A is significantly >>>> faster than B. If we did load balancing using on-clauses, would every time >>>> A used data distributed to B happen this: data is transferred from B to A >>>> - A makes computations - data is transferred back to B. So is there some >>>> mechanism that A can cache the data owned by B, and only send it back if B >>>> actually needs it? >>>>> On the other hand, if these techniques work for you, it's easier than >>>>> writing new code. It's just not how I want Chapel to be used >>>>> eventually >>>>> once we've beefed up our suite of distributions. >>>>>> I don't have enough knowledge to say anything about the efficiency >>>>>> issue. However, as an end-user I think that implementing custom >>>>>> domain >>>>>> map seems way too difficult. I think that either writing custom >>>>>> domain >>>>>> maps should be easier, >>>>> That's a fine opinion to hold, but personally I have no idea how to >>>>> make >>>>> it easier without requiring O(n) storage (where n is the number of >>>>> indices >>>>> in your domain) and/or making performance terrible (and we already get >>>>> enough complaints about that). The current approach is our answer to >>>>> what >>>>> is arguably an open research problem. The papers might be interesting >>>>> reading if you haven't seen them (see the "Advanced Chapel Features" >>>>> section at http://chapel.cray.com/papers.html). There obviously may be >>>>> other better solutions, but I have no idea what they would be after >>>>> working on this problem for the past 20 years. I may just have worked >>>>> myself too far into a specific mental rut, though. >>>> One thing I'd find easy to use would be that constructor of distribution >>>> would take an object with at least following methods as an argument: >>>> - One that takes a locale as an argument and returns size of array >>>> (zero-based, densely indexed) that should be allocated to that locale >>>> - One that takes an index as an argument and returns which locale it >>>> belongs to. >>>> - One that takes an index as an argument and returns its position in the >>>> local array. >>>> >>>> Also ability to constrain implementation to a specific dimension would >>>> make things easier. Another similar solution would be to write an abstract >>>> class describing distributions of rectangular, densely-indexed grids, and >>>> having those aforementioned methods defined as abstract. Apart from those, >>>> generic default implementations could be provided for methods, or in some >>>> cases compile-time error could be generated. I don't know if these would >>>> be viable options in Chapel, though. But however, at least when it comes >>>> to user-defined distributions I don't think that the one size fits all >>>> -approach would work too well. Instead, providing specialized tools for >>>> most common situations would be better in my opinion, like ones I >>>> suggested above for rectangular, densely indexed grids. As far as I can >>>> see, this solution would provide a lot of flexibility, while still being >>>> rather easy to implement and not being too inefficient. ------------------------------------------------------------------------------ New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet _______________________________________________ Chapel-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-users
