Re: [Chapel-developers] Zippered Loop Communication Optimization

Aroon Sharma Thu, 18 Dec 2014 18:58:11 -0800

Thanks for the detailed response Brad. I'm not really under any big time 
constraints for action on this topic from you guys at Cray, but I would like to 
move forward with this soon. I will be graduating from the University of 
Maryland on Sunday, and this was one of the "loose ends" that I wanted to 
follow up on. I will be more than happy to work with whoever is assigned to 
reviewing my paper and code from Cray in January.
Some feedback to your points:
With respect to the work that was done by the University of Malaga, our work 
applies bulk transfers to more generic zippered loops that zip Cyclic and Block 
Cyclic array slices. From what I remember, their work was restricted to whole 
array assignments between Block and Cyclic arrays (i.e A = B where B is Block 
and A is Cyclic). Since whole array assignment and zippered iteration are 
fundamentally related, I think there is a lot of overlap between both works. In 
fact, our implementation uses a strided communication primitive that they 
developed. 
Our work, for example, can aggregate something like:
forall (a, b, c) in zip(A[1..100], B[2..101], C[3..102]) {        a = b + c;}
where A, B, and C are all Cyclic. Because different array slices are referenced 
in the zippering, a, b, and c will be from different locales on all iterations 
of the loop. I don't believe that the work by the University of Malaga could be 
applied to situations like this. 
For my test programs from Polybench that I translated to Chapel, I have my own 
bash scripts that I have been running to do the testing. I will take a look at 
the TestSystem.txt and see what needs to be changed in order for those to be 
ready to submit. I think my test programs are their own contribution that can 
be considered for inclusion into the repo independently of my contributions to 
Cyclic and Block Cyclic. 
Thanks for the testing advice. I'll focus on those directories. I'll also 
complete and submit the two license agreements that you mentioned as well, 
while you guys have more time to review my paper.
I do have a fork of the chapel repo 
(https://github.com/aroonsharma1/chapel.git) with my Cyclic modifications, but 
not my Block Cyclic modifications or my Polybench tests. Those will be up once 
I do the testing that you suggested and add my own test programs to the Chapel 
testing system. Once all that has been done, I will send an official pull 
request, and let you all know. Thanks again!  
Aroon Sharma
University of Maryland, Class of 2015
M.S. Computer Engineering
(301) 908-9528

     On Thursday, December 18, 2014 8:44 PM, Brad Chamberlain <[email protected]> 
wrote:

Hi Aroon --

Sorry for the slow response.  Part of the reason that the answer to these 
questions wasn't obvious enough to me to respond immediately (and still 
isn't) is that I haven't had the chance to review the paper or code in 
enough detail to know what my opinion is.  To that end, my intention was 
to have someone on the Chapel team who is familiar with the code read the 
paper and review the code and give a proposal -- though with the holidays 
coming up, I suspect that this won't happen until the new year.  Is there 
any timeline on which you are looking for a response or action on these 
questions?

Here are some other random reactions:

* I think for your work to be available on a GitHub branch (whether on
  your fork/repository or one of ours) would be great and that, as you
  ask, the big question is whether or not to merge some or all of it.

* Part of what I'd hope to understand is how your work would relate to
  work that has already been done by the University of Malaga in bulk
  transfers (I believe it's pretty different) or our own group's work
  on stencil domain maps (there seems more similarity here) or our
  current work on standalone parallel iterators and/or our plans for
  next-generation leader-follower iterators.  This is whay I'm hoping
  to find a volunteer on the team to investigate and make a proposal
  for.

* Block Cyclic is, in general, not in the greatest shape and needs a lot
  more attention, so it's difficult to say whether it's better off with or
  without your changes.  My inclination would be

* In general, we are happy to have more tests and benchmarks, especially
  from standard suites, written in Chapel and in the repository.  Have
  you set these up to run within the Chapel testing system (in which case
  it seems like a no-brainer to review them and check them in), or are
  you running them manually or via Makefiles or homegrown testing scripts
  (in which case more work would be required either on your part or ours;
  doc/developer/bestPractices/TestSystem.txt talks more about how to
  create tests in our system, if you haven't already found that).

* Ultimately, for any code being contributed back, we will require a
  Chapel contributor agreement (see the links in the first paragraph of
  the "Developer Resources" page at chapel.cray.com).  And I believe that
  for work done by academic contributors under funding, we (unfortunately)
  need both an ILCA (individual) and CCLA ("corporate" -- in this case,
  the department or university, as appropriate) agreement for our lawyers
  to feel confident that the code is OK to contribute back.

* As far as testing goes, for things that change multi-locale execution
  (like this), it's usually sufficient to spot-check test/release/examples
  and test/multilocale and test/distributions.  In your case, since you're
  modifying cyclic and block-cyclic, you'd probably want to run
  test/distributions/robust for each of those configurations.  I believe
  the README in that directory should describe how to do this.

Hope this is helpful.  If you wanted to minimize effort, I think focusing 
on the contributor agreement first (to make sure there are no surprises) 
followed by the test suite (because it seems like a no-brainer) makes 
sense and that will give us more time to review the details of the work 
and weigh in on whether we want to incorporate it or not.  If you have a 
pointer to a branch or pull request that contains the code modifications 
involved, that could be helpful for that process as well.

Thanks,
-Brad

On Mon, 15 Dec 2014, Aroon Sharma wrote:

> Hi everyone, 

> I've been meaning to submit for review my code that modifies the Cyclic 
> and Block Cyclic distributions to perform a communication optimization 
> for certain zippered loops. It has been a while since I presented this 
> work at CHIUW '14 (http://chapel.cray.com/CHIUW/2014/Sharma_talk.pdf) 
> and PGAS '14 
> (http://nic.uoregon.edu/pgas14/papers/pgas14_submission_22.pdf), and I 
> wanted to know if this sort of thing is still wanted by the community 
> (to officially go into the Chapel language). 

> Here is a brief description of the optimization to refresh everyone's 
> memory, but feel free to reference the paper and talk I gave linked 
> above. In zippered for loops that access array slices for arrays 
> distributed via Cyclic or Block Cyclic, the language currently 
> communicates remote data elements one at a time, which can result in a 
> lot of communication.However, in some cases, remote data elements from 
> an array slice are all separated by a fixed distance in memory (a side 
> effect of being cyclically distributed) and can be communicated to the 
> locale where they are needed in one message before the loop, thereby 
> lowering communication and saving runtime. 

> To implement the optimization for Cyclic, I've modified the CyclicArr 
> follower iterator (~100 lines of code), and for Block Cyclic I've 
> modified the BlockCyclicDom leader iterator and BlockCyclicArr follower 
> iterator (~100 lines of code). 

> I'm confident that the Cyclic implementation is complete and ready to be 
> reviewed, but I have some reservations about the Block Cyclic 
> implementation. The optimization only applies to one dimensional arrays 
> distributed with Block Cyclic. I'm not even sure if it is possible to do 
> array slicing in Block Cyclic with multi-dimensional arrays (I currently 
> get compiler errors when I try to do so in a program). This limitation 
> causes the implementation for Block Cyclic to be kind of "hacky", which 
> may not be a good thing, since it may stomp on other inner workings of 
> the Block Cyclic distribution. 

> I've already forked the most recent version of the chapel repo from 
> github and added my changes to the Cyclic distribution to perform the 
> optimization. I'd like to know from the community:

>     1. Do we want this sort of optimization to be a part of the 
> language?

>     2. If so, do we want it for both Cyclic and Block Cyclic (which is 
> complete but a bit of a hack)?

>     3. If we want this, what sort of testing should I run on my fork 
> before I submit this for review? I started to run 'start_test' only to 
> realize that it takes too long on my machine, and I probably don't need 
> to run the whole suite.    

> 4. I also have a whole suite of Chapel benchmarks (Polybench C 
> benchmarks translated to Chapel by hand) that I used to test my 
> optimization that may be of use to the community. Should I include those 
> somewhere in my submission? These are only a few of the issues/concerns 
> that I came up with about these changes to the Cyclic and Block Cyclic 
> distributions. I'm sure there will be a lot more. Please don't hesitate 
> to discuss them with me. Thanks

> Aroon Sharma
> University of Maryland, Class of 2014
> M.S. Computer Engineering
>

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk

_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Re: [Chapel-developers] Zippered Loop Communication Optimization

Reply via email to