I think it would be incredibly useful to allow a user to specify the layout
ordering of array dimensions, not just column-major (i.e. left-to-right in
N-d) and row-major (i.e. right-to-left in N-d).

For example, quantum chemists do a lot with 4D arrays and use all 4!=24
access patterns at one point or another.  It would be neat if one could
define a 4D array with e.g. 1234 dimension ordering (this is either row or
column major depending on what one means by "first dimension") and another
with e.g. 3412 ordering (which is a column-major matrix of row-major
matrices in one convention).

>From this, one would expect array assignment to implement the out-of-place
array transposition internally, which allows the runtime to use an optimal
implementation.  This matters, because the optimal implementation is >2x
faster than the naive one a user writes (e.g.
https://arxiv.org/abs/1704.04374, https://arxiv.org/abs/1607.01249).

Implementing any ordering of dimensions is pretty easy under the hood, and
doing so is often much more efficient because compilers appear to be better
at indexing multi-dimensional arrays than they are at optimizing integer
expressions used to compute explicit offsets into 1D arrays.  I saw this
based upon my experience with writing tensor code in Fortran and C99 for
more than a decade.

In any case, Chapel should do all that it can to ensure that nobody every
writes code that looks like this:

t3[h2+h2u*(h3+h3u*(h1+h1u*(p4+p4u*(p6+p6u*p5))))] +=
t2[h7+h7u*(p4+p4u*(h1+h1u*h2))] * v2[h7+h7u*(h3+h3u*(p6+p6u*p5))];

(
https://github.com/jeffhammond/nwchem-tce-triples-kernels/blob/master/src/ccsd_t_kernels_1d.c#L696
)

One can also think about not-strictly-linear orderings like Morton/Z-order
and space-filling curves but I'm not going to attempt to do so.

Jeff


On Wed, Jul 12, 2017 at 5:42 AM, Michael Ferguson <[email protected]>
wrote:
>
> Hi -
>
> It may well be the right choice to have DefaultRectangular
> be configurable in row-major vs column-major layout, but
> I have some reservations about this approach.
>
> First, DefaultRectangular has to support arbitrary dimension.
> Is it important to support arbitrary dimension "column-major"
> layouts? What about a 3D layout, where only the middle dimension
> is stored differently from how it works now? As far as I know,
> people interested in "column major" arrays are talking
> about 2D arrays / matrices.
>
> Second, I'm not so sure it's a good precedent to put it in
> to DefaultRectangular. Won't we prefer to make any new layout a
> configurable option in DefaultRectangular for to "minimize software
> layers and complexity"? I think it would be preferably to allow
> other layouts to exist separately and to solve whatever software
> engineering/complexity problems come up. (E.g. giving Block
> the ability to select the layout used).
>
> Couldn't we view a Column-major layout as an array that is
> similar to an array view? That is, it is not really storing
> the array as much as adjusting the mapping from indices
> to elements. Isn't that similar to a rank-change array view,
> in particular? Didn't we just decide that it was better
> to make a rank-change array view a different type, rather
> than folding it in to DefaultRectangular? Why is this different?
>
> Cheers,
>
> -michael
>
> >
> >One belated follow-up that I meant to include in the original mail:
> >
> >An advantage of having 'DefaultRectangular' able to choose between row-
> >and column-major order (or potentially, eventually, other layouts --
> >tiling?  Morton order?) is that most other domain maps are built in terms
> >of DefaultRectangular.  So (I believe) it would probably simplify our
> >ability to have other array implementations change their layout.  E.g.,
> >you could imagine having 'Block' take a similar 'param' and thread it
> >down
> >into the DefaultRectangulars that it uses to represent each local's block
> >of data rather than having it switch to a completely different domain
map.
> >
> >-Brad
> >
> >
> >On Tue, 11 Jul 2017, Brad Chamberlain wrote:
> >
> >>
> >> Hi Apan --
> >>
> >> Thanks for continuing to wrestle with these questions of memory layout
> >>which,
> >> I agree, are very important (and increasingly so).
> >>
> >> A few people have taken stabs at doing column-major domain maps in
> >>Chapel
> >> over time (e.g., test/arrays/marybeth/CMO_array.chpl was an early
> >> proof-of-concept that has arguably long outlived its utility and I
> >>suspect
> >> isn't really worth poking into beyond noting that it exists). But, as
> >>you've
> >> probably observed, none have made it into the standard modules or
> >>release.  I
> >> think it'd be great to change that.
> >>
> >> My thought about how to approach this would be to have the
> >>DefaultRectangular
> >> layout take a 'param' argument indicating whether to use row- or
> >>column-major
> >> order in order to minimize software layers and complexity while keeping
> >>that
> >> decision as much in the core array type as possible. I've long
> >>hypothesized
> >> that the changes to the domain map ought to be as simple as adding the
> >>param
> >> and reversing a few of the loops that iterate over the dimensions in
> >>order to
> >> calculate offsets and such.  But I could be overlooking something.
> >>
> >> Then the remaining challenge would be around how to expose
> >> 'DefaultRectangular' to the user so that they could invoke it directly
> >>to
> >> override the default value of that param, say. This would likely
> >>involve a
> >> conversation about naming (DefaultRectangular has always been intended
> >>as an
> >> internal working name rather than something a user would call, but with
> >>an
> >> appropriate choice of name, I think we could/perhaps should expose it
> >>to the
> >> user.  I think of this as some wrestling with naming (which is always
> >> surprisingly hard) and minor code shuffling.
> >>
> >> (Or maybe we'd just have two instances of DefaultRectangular available
> >>to the
> >> user to use in a 'dmapped' clause if they wanted to be explicit, one
> >>set up
> >> for row-major order and one for column-major?).
> >>
> >> Anyway, those are my immediate thoughts -- others' may vary,
> >> -Brad
> >>
> >>
> >> On Tue, 11 Jul 2017, Qasem, Apan M wrote:
> >>
> >>> Hello,
> >>>
> >>> I am investigating data layout transformations in Chapel that are
> >>>suitable
> >>> for heterogeneous memory hierarchies. Organization of data - layout
> >>>and
> >>> placement - can have a huge impact on the performance of heterogeneous
> >>> applications. For example, an AoS data structure allocated in host
> >>>(CPU)
> >>> memory, should generally be converted to SoA when being mapped to
> >>>device
> >>> (GPU/accelerator) memory to improve memory coalescing behavior. Below
> >>>is a
> >>> simple example of two alternate layouts for an array of records for a
> >>>GPU
> >>> offloaded task (AMD is currently working on providing GPU support in
> >>> Chapel). The discrete array implementation yields almost a factor of
> >>>two
> >>> speedup over the array-of-records implementation on an AMD APU
> >>>(integrated
> >>> GPU) platform.
> >>>
> >>> record img {
> >>>  var r: real;
> >>>  var g: real;
> >>>  var b: real;
> >>>  var x: real;
> >>> }
> >>>
> >>> var dom: domain(1) = 1 .. N;
> >>>
> >>> var src: [dom] img;
> >>> var dst: [dom] img;
> >>>
> >>> on (Locales[0]:LocaleModel).GPU do {
> >>>  forall i in 1 .. N {
> >>>    dst[i].r = src[i].r * v0 – src[i].g * v1;
> >>>    dst[i].x = v1;
> >>>  }
> >>> }
> >>>
> >>>
> >>> var src_r: [dom] real;
> >>> var src_g: [dom] real;
> >>> var src_b: [dom] real;
> >>> var src_x: [dom] real;
> >>>
> >>> var dst_r: [dom] real;
> >>> var dst_g: [dom] real;
> >>> var dst_b: [dom] real;
> >>> var dst_x: [dom] real;
> >>>
> >>> on (Locales[0]:LocaleModel).GPU do {
> >>>  forall i in 1 .. N {
> >>>    dst_r[i] = src_r[i] * v0 - src_g[i] * v1;
> >>>    dst_x[i] = v1;
> >>>  }
> >>> }
> >>>
> >>>
> >>> As part of this work, I implemented a domain map module for
> >>>column-major
> >>> ordering of multi-dimensional arrays. Depending on the data access
> >>> patterns, a column-major ordering may be appropriate for device-mapped
> >>>data
> >>> structures because it can reduce memory divergence in the kernel. It
> >>>should
> >>> be noted, however, that column major ordering can have a major impact
> >>>on
> >>> CPU performance as well. For instance, when the innermost  loop index
> >>> strides through contiguous elements of a column.
> >>>
> >>> Below is an example, of how the column-major layout would be invoked
> >>>in a
> >>> Chapel program
> >>>
> >>>
> >>>
> >>> var domRowMajor : domain(2) = {rows, cols};
> >>> var domColMajor : domain(2) dmapped ColMajor() = {rows, cols};
> >>>
> >>> var A : [domRowMajor] real;
> >>> var B : [domColMajor] real;
> >>>
> >>> for i in rows {
> >>>  for j in cols {
> >>>    A[i,j] = 17.0; // accessed as row-major
> >>>    B[i,j] = 0.0;  // accessed as col-major
> >>>  }
> >>> }
> >>>
> >>>
> >>> Implementation of ColMajor() draws from the implementation of the
> >>> DefaultRectangular() module. Column major ordering is enforced by
> >>> re-mapping the indices in the domain. For a 2D array, this is
> >>>essentially a
> >>> swap of the two dimensions. For higher order arrays, the dimensions
> >>>are
> >>> reversed such that the nth dimension becomes the innermost (i.e.,
> >>> contiguous). The entire code is contained in the file
> >>> modules/layouts/ColMajor.chpl.
> >>>
> >>> I would like to make a pull request to integrate the ColMajor() module
> >>>into
> >>> Chapel. But I wanted to get feedback from the dev community first.
> >>>
> >>> - Apan
>
>
------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Chapel-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/chapel-developers




--
Jeff Hammond
[email protected]
http://jeffhammond.github.io/
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to