Chapel Developers --

Executive summary: Preview a proposed patch to deprecrate the "slice", 
"reindex", and "rank change" interfaces in the domain maps (mostly) in 
favor of using external "array view" domain maps instead, for the purposes 
of simplicity, performance, and to enable scalar optimizations.  Given the 
approach of the feature freeze, I wanted to preview this change 
(conceptually big, but pretty small in terms of amount of new code) before 
the patch was ready to go.  Vass, David Iten, Elliot, Ben, Kyle, and 
others who have worked on domain maps are likely to be most interested.
The work-in-progress can be viewed here:

        https://github.com/bradcray/chapel/compare/arrayViewDomMap

----

In more detail:

I wanted to give everyone a preview of some work that I started this past 
weekend which I think is very promising.  Originally, I thought that there 
was no way to get this done before the release, then I started thinking 
that maybe it was tractable, so dove into it over the holiday weekend "for 
fun" thinking it might get in just before the release.  Now it appears 
that there's a good chance that it could go in tomorrow, this weekend, or 
very early next week.

The seed of the idea for the work stems from a conversation that we held 
in the past month or so at Cray in which we were bumping up against 
challenges in the representation and semantics of arrays from multiple 
directions.  This led to a discussion of a concept that had been discussed 
before but never pursued of an "array view" which would act array-like but 
be represented as a reference to an array and a reference to a domain.

To give the flavor of the idea, an array view representing an array slice 
would store a reference to the class representing the array being sliced 
and another to the class representing the domain doing the slicing.  Then 
various operations like bounds checking or leader iteration would be done 
by dispatching to the domain's interface while operations like indexing 
the array would go through the array's interface.

The primary impact of this change is that a domain map author need no 
longer provide interfaces to do slicing, reindexing, or rank change. These 
have traditionally been the most challenging portions of the domain map 
interface to implement, and in most cases they've never been implemented 
completely/correctly/efficiently as a result.  In part this is because 
most domain maps try to use a "closed representation form" in which the 
slice of a Cyclic array is also stored as a cyclic array.  Yet if the 
stride of a slicing domain and the number of locales are relatively prime, 
such cases are impossible to represent in a closed form.  So the domain 
map author is forced to either create a much more general representation 
for cyclic distributions, or to have cyclic slicing return another more 
general domain map type (requiring authoring two domain maps rather than 
one), or to just say "Sorry, I can't support that case" at runtime, which 
breaks the plug-and-play nature of domain maps. By removing these concerns 
and implementing such operations through array views we make the domain 
map implementer's job much simpler, and support for the operations much 
more general.

But what if I still really want to implement these myself?  There is a 
hook that permits you to do so.  DefaultRectangular uses it, for example, 
so all default rectangular slice/reindex/rank change operations are still 
represented as a default rectangular array that aliases the original one's 
ddata segment.  Interestingly, performance was actually better for some 
programs when using the array view rather than the closed form, but that's 
not something I've had a chance to explore much.

One might expect the use of array views to add overhead to these 
operations compared to using in-place operations, but so far I haven't 
seen big impacts from the nightly performance suite (which includes some 
no-local and block cases), and have actually seen a (dramatic in some 
cases -- 30%) reduction in communications for some cases such as slicing 
using the distribution robustness suite.

A few other nice impacts of the change are:

* It permits us to stop relying on the use of the => operator in creating
   classes whose array fields alias existing arrays.  I think this is nice
   in that (a) this feature has always felt a bit fragile and sketchy to
   me from a design standpoint (felt like a crutch when we used it and has
   never felt first-class to me);  (b) moreover, we've wanted to retire
   the => array alias operator in favor of using 'ref' members, so this
   paves the way for that by reducing reliance on it; and (c) using this
   => approach has thwarted other optimization opportunities (see next
   bullet)

* Experiments by Ben and Elliot in the past 6-12 months have shown that
   the conservative, but typically unnecessary, inner-dimension
   multiplication that our arrays use in order to support the full
   richness of Chapel's array semantics can significantly hurt performance
   for codes that are not memory bound (such as matrix multiplication,
   the CSU benchmarks, the shootout's nbody, etc.).  Elliot and I wrote
   a patch that got rid of these multiplies when they weren't necessary,
   but doing so required adding some more generic-ness to the
   DefaultRectangular array representation which broke the use of the =>
   operator when slicing/reindexing/rank changing above.  Though I haven't
   tried merging these two patches yet, I see no reason that they shouldn't
   play completely well together, enabling us to get C-level performance
   for many array idioms without any additional compiler analysis or
   optimization.



At present, the status of the patch is:

* passes normal "linux64" correctness testing with 4 exceptions, all
   related to the bulk comm optimization tests implemented by our
   colleagues at Malaga.  The reason is that I haven't implemented the
   doiBulkTransfer* routines that they require (and am not sure how
   to do so, not being familiar with the interface).

* passes block and cyclic testing of the
   distributions/robustness/arithmetic suite with one failure in both
   (its version of hpl core dumps), and one that's cyclic-specific.

* passes gasnet testing of the multilocale directory with 3 failures
   (just came in this morning so I haven't had a chance to analyze yet).

* hasn't significantly hurt performance and helps in several key cases
   (though I need to do more runs still... my last run was a few commits
   ago).


Because of the likely impact of this commit on performance, I'm hoping to 
commit it on a day that other performance-oriented changes are not going 
on so that its effects can be isolated.  That, and the schedule that I 
seem to be on, suggest maybe getting it in this weekend.


I'm interested in any feedback people have on the concept or the code 
(though it still needs a fair amount of clean-up), and in lining up 
reviewer(s) for when the patch is ready (particularly those who might be 
available over the weekend).

Thanks,
-Brad


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to