Hi Aroon --
Sorry for the slow response. Part of the reason that the answer to these
questions wasn't obvious enough to me to respond immediately (and still
isn't) is that I haven't had the chance to review the paper or code in
enough detail to know what my opinion is. To that end, my intention was
to have someone on the Chapel team who is familiar with the code read the
paper and review the code and give a proposal -- though with the holidays
coming up, I suspect that this won't happen until the new year. Is there
any timeline on which you are looking for a response or action on these
questions?
Here are some other random reactions:
* I think for your work to be available on a GitHub branch (whether on
your fork/repository or one of ours) would be great and that, as you
ask, the big question is whether or not to merge some or all of it.
* Part of what I'd hope to understand is how your work would relate to
work that has already been done by the University of Malaga in bulk
transfers (I believe it's pretty different) or our own group's work
on stencil domain maps (there seems more similarity here) or our
current work on standalone parallel iterators and/or our plans for
next-generation leader-follower iterators. This is whay I'm hoping
to find a volunteer on the team to investigate and make a proposal
for.
* Block Cyclic is, in general, not in the greatest shape and needs a lot
more attention, so it's difficult to say whether it's better off with or
without your changes. My inclination would be
* In general, we are happy to have more tests and benchmarks, especially
from standard suites, written in Chapel and in the repository. Have
you set these up to run within the Chapel testing system (in which case
it seems like a no-brainer to review them and check them in), or are
you running them manually or via Makefiles or homegrown testing scripts
(in which case more work would be required either on your part or ours;
doc/developer/bestPractices/TestSystem.txt talks more about how to
create tests in our system, if you haven't already found that).
* Ultimately, for any code being contributed back, we will require a
Chapel contributor agreement (see the links in the first paragraph of
the "Developer Resources" page at chapel.cray.com). And I believe that
for work done by academic contributors under funding, we (unfortunately)
need both an ILCA (individual) and CCLA ("corporate" -- in this case,
the department or university, as appropriate) agreement for our lawyers
to feel confident that the code is OK to contribute back.
* As far as testing goes, for things that change multi-locale execution
(like this), it's usually sufficient to spot-check test/release/examples
and test/multilocale and test/distributions. In your case, since you're
modifying cyclic and block-cyclic, you'd probably want to run
test/distributions/robust for each of those configurations. I believe
the README in that directory should describe how to do this.
Hope this is helpful. If you wanted to minimize effort, I think focusing
on the contributor agreement first (to make sure there are no surprises)
followed by the test suite (because it seems like a no-brainer) makes
sense and that will give us more time to review the details of the work
and weigh in on whether we want to incorporate it or not. If you have a
pointer to a branch or pull request that contains the code modifications
involved, that could be helpful for that process as well.
Thanks,
-Brad
On Mon, 15 Dec 2014, Aroon Sharma wrote:
Hi everyone,
I've been meaning to submit for review my code that modifies the Cyclic
and Block Cyclic distributions to perform a communication optimization
for certain zippered loops. It has been a while since I presented this
work at CHIUW '14 (http://chapel.cray.com/CHIUW/2014/Sharma_talk.pdf)
and PGAS '14
(http://nic.uoregon.edu/pgas14/papers/pgas14_submission_22.pdf), and I
wanted to know if this sort of thing is still wanted by the community
(to officially go into the Chapel language).
Here is a brief description of the optimization to refresh everyone's
memory, but feel free to reference the paper and talk I gave linked
above. In zippered for loops that access array slices for arrays
distributed via Cyclic or Block Cyclic, the language currently
communicates remote data elements one at a time, which can result in a
lot of communication.However, in some cases, remote data elements from
an array slice are all separated by a fixed distance in memory (a side
effect of being cyclically distributed) and can be communicated to the
locale where they are needed in one message before the loop, thereby
lowering communication and saving runtime.
To implement the optimization for Cyclic, I've modified the CyclicArr
follower iterator (~100 lines of code), and for Block Cyclic I've
modified the BlockCyclicDom leader iterator and BlockCyclicArr follower
iterator (~100 lines of code).
I'm confident that the Cyclic implementation is complete and ready to be
reviewed, but I have some reservations about the Block Cyclic
implementation. The optimization only applies to one dimensional arrays
distributed with Block Cyclic. I'm not even sure if it is possible to do
array slicing in Block Cyclic with multi-dimensional arrays (I currently
get compiler errors when I try to do so in a program). This limitation
causes the implementation for Block Cyclic to be kind of "hacky", which
may not be a good thing, since it may stomp on other inner workings of
the Block Cyclic distribution.
I've already forked the most recent version of the chapel repo from
github and added my changes to the Cyclic distribution to perform the
optimization. I'd like to know from the community:
1. Do we want this sort of optimization to be a part of the
language?
2. If so, do we want it for both Cyclic and Block Cyclic (which is
complete but a bit of a hack)?
3. If we want this, what sort of testing should I run on my fork
before I submit this for review? I started to run 'start_test' only to
realize that it takes too long on my machine, and I probably don't need
to run the whole suite.
4. I also have a whole suite of Chapel benchmarks (Polybench C
benchmarks translated to Chapel by hand) that I used to test my
optimization that may be of use to the community. Should I include those
somewhere in my submission? These are only a few of the issues/concerns
that I came up with about these changes to the Cyclic and Block Cyclic
distributions. I'm sure there will be a lot more. Please don't hesitate
to discuss them with me. Thanks
Aroon Sharma
University of Maryland, Class of 2014
M.S. Computer Engineering
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers