Hi everyone, 
I've been meaning to submit for review my code that modifies the Cyclic and 
Block Cyclic distributions to perform a communication optimization for certain 
zippered loops. It has been a while since I presented this work at CHIUW '14 
(http://chapel.cray.com/CHIUW/2014/Sharma_talk.pdf) and PGAS '14 
(http://nic.uoregon.edu/pgas14/papers/pgas14_submission_22.pdf), and I wanted 
to know if this sort of thing is still wanted by the community (to officially 
go into the Chapel language). 
Here is a brief description of the optimization to refresh everyone's memory, 
but feel free to reference the paper and talk I gave linked above. In zippered 
for loops that access array slices for arrays distributed via Cyclic or Block 
Cyclic, the language currently communicates remote data elements one at a time, 
which can result in a lot of communication.However, in some cases, remote data 
elements from an array slice are all separated by a fixed distance in memory (a 
side effect of being cyclically distributed) and can be communicated to the 
locale where they are needed in one message before the loop, thereby lowering 
communication and saving runtime. 
To implement the optimization for Cyclic, I've modified the CyclicArr follower 
iterator (~100 lines of code), and for Block Cyclic I've modified the 
BlockCyclicDom leader iterator and BlockCyclicArr follower iterator (~100 lines 
of code). 
I'm confident that the Cyclic implementation is complete and ready to be 
reviewed, but I have some reservations about the Block Cyclic implementation. 
The optimization only applies to one dimensional arrays distributed with Block 
Cyclic. I'm not even sure if it is possible to do array slicing in Block Cyclic 
with multi-dimensional arrays (I currently get compiler errors when I try to do 
so in a program). This limitation causes the implementation for Block Cyclic to 
be kind of "hacky", which may not be a good thing, since it may stomp on other 
inner workings of the Block Cyclic distribution. 
I've already forked the most recent version of the chapel repo from github and 
added my changes to the Cyclic distribution to perform the optimization. I'd 
like to know from the community:
    1. Do we want this sort of optimization to be a part of the language?
    2. If so, do we want it for both Cyclic and Block Cyclic (which is complete 
but a bit of a hack)?
    3. If we want this, what sort of testing should I run on my fork before I 
submit this for review? I started to run 'start_test' only to realize that it 
takes too long on my machine, and I probably don't need to run the whole suite. 
    4. I also have a whole suite of Chapel benchmarks (Polybench C benchmarks 
translated to Chapel by hand) that I used to test my optimization that may be 
of use to the community. Should I include those somewhere in my submission? 
These are only a few of the issues/concerns that I came up with about these 
changes to the Cyclic and Block Cyclic distributions. I'm sure there will be a 
lot more. Please don't hesitate to discuss them with me. Thanks
Aroon Sharma
University of Maryland, Class of 2014
M.S. Computer Engineering
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to