Chapel Developers --
Executive summary: Preview a proposed patch to deprecrate the "slice",
"reindex", and "rank change" interfaces in the domain maps (mostly) in
favor of using external "array view" domain maps instead, for the purposes
of simplicity, performance, and to enable scalar optimizations. Given the
approach of the feature freeze, I wanted to preview this change
(conceptually big, but pretty small in terms of amount of new code) before
the patch was ready to go. Vass, David Iten, Elliot, Ben, Kyle, and
others who have worked on domain maps are likely to be most interested.
The work-in-progress can be viewed here:
https://github.com/bradcray/chapel/compare/arrayViewDomMap
----
In more detail:
I wanted to give everyone a preview of some work that I started this past
weekend which I think is very promising. Originally, I thought that there
was no way to get this done before the release, then I started thinking
that maybe it was tractable, so dove into it over the holiday weekend "for
fun" thinking it might get in just before the release. Now it appears
that there's a good chance that it could go in tomorrow, this weekend, or
very early next week.
The seed of the idea for the work stems from a conversation that we held
in the past month or so at Cray in which we were bumping up against
challenges in the representation and semantics of arrays from multiple
directions. This led to a discussion of a concept that had been discussed
before but never pursued of an "array view" which would act array-like but
be represented as a reference to an array and a reference to a domain.
To give the flavor of the idea, an array view representing an array slice
would store a reference to the class representing the array being sliced
and another to the class representing the domain doing the slicing. Then
various operations like bounds checking or leader iteration would be done
by dispatching to the domain's interface while operations like indexing
the array would go through the array's interface.
The primary impact of this change is that a domain map author need no
longer provide interfaces to do slicing, reindexing, or rank change. These
have traditionally been the most challenging portions of the domain map
interface to implement, and in most cases they've never been implemented
completely/correctly/efficiently as a result. In part this is because
most domain maps try to use a "closed representation form" in which the
slice of a Cyclic array is also stored as a cyclic array. Yet if the
stride of a slicing domain and the number of locales are relatively prime,
such cases are impossible to represent in a closed form. So the domain
map author is forced to either create a much more general representation
for cyclic distributions, or to have cyclic slicing return another more
general domain map type (requiring authoring two domain maps rather than
one), or to just say "Sorry, I can't support that case" at runtime, which
breaks the plug-and-play nature of domain maps. By removing these concerns
and implementing such operations through array views we make the domain
map implementer's job much simpler, and support for the operations much
more general.
But what if I still really want to implement these myself? There is a
hook that permits you to do so. DefaultRectangular uses it, for example,
so all default rectangular slice/reindex/rank change operations are still
represented as a default rectangular array that aliases the original one's
ddata segment. Interestingly, performance was actually better for some
programs when using the array view rather than the closed form, but that's
not something I've had a chance to explore much.
One might expect the use of array views to add overhead to these
operations compared to using in-place operations, but so far I haven't
seen big impacts from the nightly performance suite (which includes some
no-local and block cases), and have actually seen a (dramatic in some
cases -- 30%) reduction in communications for some cases such as slicing
using the distribution robustness suite.
A few other nice impacts of the change are:
* It permits us to stop relying on the use of the => operator in creating
classes whose array fields alias existing arrays. I think this is nice
in that (a) this feature has always felt a bit fragile and sketchy to
me from a design standpoint (felt like a crutch when we used it and has
never felt first-class to me); (b) moreover, we've wanted to retire
the => array alias operator in favor of using 'ref' members, so this
paves the way for that by reducing reliance on it; and (c) using this
=> approach has thwarted other optimization opportunities (see next
bullet)
* Experiments by Ben and Elliot in the past 6-12 months have shown that
the conservative, but typically unnecessary, inner-dimension
multiplication that our arrays use in order to support the full
richness of Chapel's array semantics can significantly hurt performance
for codes that are not memory bound (such as matrix multiplication,
the CSU benchmarks, the shootout's nbody, etc.). Elliot and I wrote
a patch that got rid of these multiplies when they weren't necessary,
but doing so required adding some more generic-ness to the
DefaultRectangular array representation which broke the use of the =>
operator when slicing/reindexing/rank changing above. Though I haven't
tried merging these two patches yet, I see no reason that they shouldn't
play completely well together, enabling us to get C-level performance
for many array idioms without any additional compiler analysis or
optimization.
At present, the status of the patch is:
* passes normal "linux64" correctness testing with 4 exceptions, all
related to the bulk comm optimization tests implemented by our
colleagues at Malaga. The reason is that I haven't implemented the
doiBulkTransfer* routines that they require (and am not sure how
to do so, not being familiar with the interface).
* passes block and cyclic testing of the
distributions/robustness/arithmetic suite with one failure in both
(its version of hpl core dumps), and one that's cyclic-specific.
* passes gasnet testing of the multilocale directory with 3 failures
(just came in this morning so I haven't had a chance to analyze yet).
* hasn't significantly hurt performance and helps in several key cases
(though I need to do more runs still... my last run was a few commits
ago).
Because of the likely impact of this commit on performance, I'm hoping to
commit it on a day that other performance-oriented changes are not going
on so that its effects can be isolated. That, and the schedule that I
seem to be on, suggest maybe getting it in this weekend.
I'm interested in any feedback people have on the concept or the code
(though it still needs a fair amount of clean-up), and in lining up
reviewer(s) for when the patch is ready (particularly those who might be
available over the weekend).
Thanks,
-Brad
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers