Re: [Chapel-developers] Zippered Loop Communication Optimization

Brad Chamberlain Thu, 18 Dec 2014 17:45:06 -0800


Hi Aroon --

Sorry for the slow response. Part of the reason that the answer to thesequestions wasn't obvious enough to me to respond immediately (and stillisn't) is that I haven't had the chance to review the paper or code inenough detail to know what my opinion is. To that end, my intention wasto have someone on the Chapel team who is familiar with the code read thepaper and review the code and give a proposal -- though with the holidayscoming up, I suspect that this won't happen until the new year. Is thereany timeline on which you are looking for a response or action on thesequestions?


Here are some other random reactions:

* I think for your work to be available on a GitHub branch (whether on
  your fork/repository or one of ours) would be great and that, as you
  ask, the big question is whether or not to merge some or all of it.

* Part of what I'd hope to understand is how your work would relate to
  work that has already been done by the University of Malaga in bulk
  transfers (I believe it's pretty different) or our own group's work
  on stencil domain maps (there seems more similarity here) or our
  current work on standalone parallel iterators and/or our plans for
  next-generation leader-follower iterators.  This is whay I'm hoping
  to find a volunteer on the team to investigate and make a proposal
  for.

* Block Cyclic is, in general, not in the greatest shape and needs a lot
  more attention, so it's difficult to say whether it's better off with or
  without your changes.  My inclination would be

* In general, we are happy to have more tests and benchmarks, especially
  from standard suites, written in Chapel and in the repository.  Have
  you set these up to run within the Chapel testing system (in which case
  it seems like a no-brainer to review them and check them in), or are
  you running them manually or via Makefiles or homegrown testing scripts
  (in which case more work would be required either on your part or ours;
  doc/developer/bestPractices/TestSystem.txt talks more about how to
  create tests in our system, if you haven't already found that).

* Ultimately, for any code being contributed back, we will require a
  Chapel contributor agreement (see the links in the first paragraph of
  the "Developer Resources" page at chapel.cray.com).  And I believe that
  for work done by academic contributors under funding, we (unfortunately)
  need both an ILCA (individual) and CCLA ("corporate" -- in this case,
  the department or university, as appropriate) agreement for our lawyers
  to feel confident that the code is OK to contribute back.

* As far as testing goes, for things that change multi-locale execution
  (like this), it's usually sufficient to spot-check test/release/examples
  and test/multilocale and test/distributions.  In your case, since you're
  modifying cyclic and block-cyclic, you'd probably want to run
  test/distributions/robust for each of those configurations.  I believe
  the README in that directory should describe how to do this.

Hope this is helpful. If you wanted to minimize effort, I think focusingon the contributor agreement first (to make sure there are no surprises)followed by the test suite (because it seems like a no-brainer) makessense and that will give us more time to review the details of the workand weigh in on whether we want to incorporate it or not. If you have apointer to a branch or pull request that contains the code modificationsinvolved, that could be helpful for that process as well.


Thanks,
-Brad



On Mon, 15 Dec 2014, Aroon Sharma wrote:

Hi everyone,

I've been meaning to submit for review my code that modifies the Cyclicand Block Cyclic distributions to perform a communication optimizationfor certain zippered loops. It has been a while since I presented thiswork at CHIUW '14 (http://chapel.cray.com/CHIUW/2014/Sharma_talk.pdf)and PGAS '14(http://nic.uoregon.edu/pgas14/papers/pgas14_submission_22.pdf), and Iwanted to know if this sort of thing is still wanted by the community(to officially go into the Chapel language).

Here is a brief description of the optimization to refresh everyone'smemory, but feel free to reference the paper and talk I gave linkedabove. In zippered for loops that access array slices for arraysdistributed via Cyclic or Block Cyclic, the language currentlycommunicates remote data elements one at a time, which can result in alot of communication.However, in some cases, remote data elements froman array slice are all separated by a fixed distance in memory (a sideeffect of being cyclically distributed) and can be communicated to thelocale where they are needed in one message before the loop, therebylowering communication and saving runtime.

To implement the optimization for Cyclic, I've modified the CyclicArrfollower iterator (~100 lines of code), and for Block Cyclic I'vemodified the BlockCyclicDom leader iterator and BlockCyclicArr followeriterator (~100 lines of code).

I'm confident that the Cyclic implementation is complete and ready to bereviewed, but I have some reservations about the Block Cyclicimplementation. The optimization only applies to one dimensional arraysdistributed with Block Cyclic. I'm not even sure if it is possible to doarray slicing in Block Cyclic with multi-dimensional arrays (I currentlyget compiler errors when I try to do so in a program). This limitationcauses the implementation for Block Cyclic to be kind of "hacky", whichmay not be a good thing, since it may stomp on other inner workings ofthe Block Cyclic distribution.

I've already forked the most recent version of the chapel repo fromgithub and added my changes to the Cyclic distribution to perform theoptimization. I'd like to know from the community:

1. Do we want this sort of optimization to be a part of thelanguage?

2. If so, do we want it for both Cyclic and Block Cyclic (which iscomplete but a bit of a hack)?

3. If we want this, what sort of testing should I run on my forkbefore I submit this for review? I started to run 'start_test' only torealize that it takes too long on my machine, and I probably don't needto run the whole suite.

4. I also have a whole suite of Chapel benchmarks (Polybench Cbenchmarks translated to Chapel by hand) that I used to test myoptimization that may be of use to the community. Should I include thosesomewhere in my submission? These are only a few of the issues/concernsthat I came up with about these changes to the Cyclic and Block Cyclicdistributions. I'm sure there will be a lot more. Please don't hesitateto discuss them with me. Thanks

Aroon Sharma
University of Maryland, Class of 2014
M.S. Computer Engineering

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk

_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Re: [Chapel-developers] Zippered Loop Communication Optimization

Reply via email to