Hi Aroon --

Sorry for the slow response. Part of the reason that the answer to these questions wasn't obvious enough to me to respond immediately (and still isn't) is that I haven't had the chance to review the paper or code in enough detail to know what my opinion is. To that end, my intention was to have someone on the Chapel team who is familiar with the code read the paper and review the code and give a proposal -- though with the holidays coming up, I suspect that this won't happen until the new year. Is there any timeline on which you are looking for a response or action on these questions?

Here are some other random reactions:

* I think for your work to be available on a GitHub branch (whether on
  your fork/repository or one of ours) would be great and that, as you
  ask, the big question is whether or not to merge some or all of it.

* Part of what I'd hope to understand is how your work would relate to
  work that has already been done by the University of Malaga in bulk
  transfers (I believe it's pretty different) or our own group's work
  on stencil domain maps (there seems more similarity here) or our
  current work on standalone parallel iterators and/or our plans for
  next-generation leader-follower iterators.  This is whay I'm hoping
  to find a volunteer on the team to investigate and make a proposal
  for.

* Block Cyclic is, in general, not in the greatest shape and needs a lot
  more attention, so it's difficult to say whether it's better off with or
  without your changes.  My inclination would be

* In general, we are happy to have more tests and benchmarks, especially
  from standard suites, written in Chapel and in the repository.  Have
  you set these up to run within the Chapel testing system (in which case
  it seems like a no-brainer to review them and check them in), or are
  you running them manually or via Makefiles or homegrown testing scripts
  (in which case more work would be required either on your part or ours;
  doc/developer/bestPractices/TestSystem.txt talks more about how to
  create tests in our system, if you haven't already found that).

* Ultimately, for any code being contributed back, we will require a
  Chapel contributor agreement (see the links in the first paragraph of
  the "Developer Resources" page at chapel.cray.com).  And I believe that
  for work done by academic contributors under funding, we (unfortunately)
  need both an ILCA (individual) and CCLA ("corporate" -- in this case,
  the department or university, as appropriate) agreement for our lawyers
  to feel confident that the code is OK to contribute back.

* As far as testing goes, for things that change multi-locale execution
  (like this), it's usually sufficient to spot-check test/release/examples
  and test/multilocale and test/distributions.  In your case, since you're
  modifying cyclic and block-cyclic, you'd probably want to run
  test/distributions/robust for each of those configurations.  I believe
  the README in that directory should describe how to do this.

Hope this is helpful. If you wanted to minimize effort, I think focusing on the contributor agreement first (to make sure there are no surprises) followed by the test suite (because it seems like a no-brainer) makes sense and that will give us more time to review the details of the work and weigh in on whether we want to incorporate it or not. If you have a pointer to a branch or pull request that contains the code modifications involved, that could be helpful for that process as well.

Thanks,
-Brad



On Mon, 15 Dec 2014, Aroon Sharma wrote:

Hi everyone, 

I've been meaning to submit for review my code that modifies the Cyclic and Block Cyclic distributions to perform a communication optimization for certain zippered loops. It has been a while since I presented this work at CHIUW '14 (http://chapel.cray.com/CHIUW/2014/Sharma_talk.pdf) and PGAS '14 (http://nic.uoregon.edu/pgas14/papers/pgas14_submission_22.pdf), and I wanted to know if this sort of thing is still wanted by the community (to officially go into the Chapel language). 

Here is a brief description of the optimization to refresh everyone's memory, but feel free to reference the paper and talk I gave linked above. In zippered for loops that access array slices for arrays distributed via Cyclic or Block Cyclic, the language currently communicates remote data elements one at a time, which can result in a lot of communication.However, in some cases, remote data elements from an array slice are all separated by a fixed distance in memory (a side effect of being cyclically distributed) and can be communicated to the locale where they are needed in one message before the loop, thereby lowering communication and saving runtime. 

To implement the optimization for Cyclic, I've modified the CyclicArr follower iterator (~100 lines of code), and for Block Cyclic I've modified the BlockCyclicDom leader iterator and BlockCyclicArr follower iterator (~100 lines of code). 

I'm confident that the Cyclic implementation is complete and ready to be reviewed, but I have some reservations about the Block Cyclic implementation. The optimization only applies to one dimensional arrays distributed with Block Cyclic. I'm not even sure if it is possible to do array slicing in Block Cyclic with multi-dimensional arrays (I currently get compiler errors when I try to do so in a program). This limitation causes the implementation for Block Cyclic to be kind of "hacky", which may not be a good thing, since it may stomp on other inner workings of the Block Cyclic distribution. 

I've already forked the most recent version of the chapel repo from github and added my changes to the Cyclic distribution to perform the optimization. I'd like to know from the community:

    1. Do we want this sort of optimization to be a part of the language?

    2. If so, do we want it for both Cyclic and Block Cyclic (which is complete but a bit of a hack)?

    3. If we want this, what sort of testing should I run on my fork before I submit this for review? I started to run 'start_test' only to realize that it takes too long on my machine, and I probably don't need to run the whole suite.    

4. I also have a whole suite of Chapel benchmarks (Polybench C benchmarks translated to Chapel by hand) that I used to test my optimization that may be of use to the community. Should I include those somewhere in my submission? These are only a few of the issues/concerns that I came up with about these changes to the Cyclic and Block Cyclic distributions. I'm sure there will be a lot more. Please don't hesitate to discuss them with me. Thanks

Aroon Sharma
University of Maryland, Class of 2014
M.S. Computer Engineering
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to