Hi Jeff --
Thanks for the notes. After sending these mails this week, I'd been
musing over whether I believed "CMO and RMO are sufficient for most cases"
vs. "any dimension ordering should be permitted", so it's good to hear
about a context that would benefit from more general orderings.
I don't think it would be terribly difficult to write and use a domain map
that supported arbitrary dimension memory orderings in Chapel -- that's
certainly the type of thing that domain maps were designed to support. I
think the biggest challenge would probably be making it (a)
rank-independent with (b) the dimension ordering known at compile-time. It
seems this would want a combination of varargs and param (statically
known) values that I don't think we support well today. Creating a
specialized-to-4D case would be pretty simple, but would of course be less
satisfying from a generality perspective.
Thinking about my exchange with Michael the other day, I'd be inclined to
make this a new domain map ('e.g., dmapped PermuteDims(3, 4, 1, 2)')
rather than extending the DefaultRectangular domain map to support this
ability (as I've argued we could/should do for CMO). My rationale is that
I think the code would be different enough (and just epsilon-more-complex
enough) from the current DefaultRectangular implementation that the two
wouldn't live within a single implementation as well. But I could be
wrong about that.
The challenge w.r.t. rank-independence and vararg params I mention above
would be simpler if only the most memory-consecutive dimension mattered,
as Apan characterizes in his mail:
we only care about which dimension is the innermost
But it sounds like for your quantum chem context, you care about the
ordering of all dimensions... true? I ask only from a perspective of
wishful thinking because specifying one "tightest" dimension is a much
easier interface to support in a rank-independent manner. :)
Thanks,
-Brad
On Thu, 13 Jul 2017, Jeff Hammond wrote:
I think it would be incredibly useful to allow a user to specify the layout
ordering of array dimensions, not just column-major (i.e. left-to-right in
N-d) and row-major (i.e. right-to-left in N-d).
For example, quantum chemists do a lot with 4D arrays and use all 4!=24
access patterns at one point or another. It would be neat if one could
define a 4D array with e.g. 1234 dimension ordering (this is either row or
column major depending on what one means by "first dimension") and another
with e.g. 3412 ordering (which is a column-major matrix of row-major
matrices in one convention).
From this, one would expect array assignment to implement the out-of-place
array transposition internally, which allows the runtime to use an optimal
implementation. This matters, because the optimal implementation is >2x
faster than the naive one a user writes (e.g.
https://arxiv.org/abs/1704.04374, https://arxiv.org/abs/1607.01249).
Implementing any ordering of dimensions is pretty easy under the hood, and
doing so is often much more efficient because compilers appear to be better
at indexing multi-dimensional arrays than they are at optimizing integer
expressions used to compute explicit offsets into 1D arrays. I saw this
based upon my experience with writing tensor code in Fortran and C99 for
more than a decade.
In any case, Chapel should do all that it can to ensure that nobody every
writes code that looks like this:
t3[h2+h2u*(h3+h3u*(h1+h1u*(p4+p4u*(p6+p6u*p5))))] +=
t2[h7+h7u*(p4+p4u*(h1+h1u*h2))] * v2[h7+h7u*(h3+h3u*(p6+p6u*p5))];
(
https://github.com/jeffhammond/nwchem-tce-triples-kernels/blob/master/src/ccsd_t_kernels_1d.c#L696
)
One can also think about not-strictly-linear orderings like Morton/Z-order
and space-filling curves but I'm not going to attempt to do so.
Jeff
On Wed, Jul 12, 2017 at 5:42 AM, Michael Ferguson <[email protected]>
wrote:
Hi -
It may well be the right choice to have DefaultRectangular
be configurable in row-major vs column-major layout, but
I have some reservations about this approach.
First, DefaultRectangular has to support arbitrary dimension.
Is it important to support arbitrary dimension "column-major"
layouts? What about a 3D layout, where only the middle dimension
is stored differently from how it works now? As far as I know,
people interested in "column major" arrays are talking
about 2D arrays / matrices.
Second, I'm not so sure it's a good precedent to put it in
to DefaultRectangular. Won't we prefer to make any new layout a
configurable option in DefaultRectangular for to "minimize software
layers and complexity"? I think it would be preferably to allow
other layouts to exist separately and to solve whatever software
engineering/complexity problems come up. (E.g. giving Block
the ability to select the layout used).
Couldn't we view a Column-major layout as an array that is
similar to an array view? That is, it is not really storing
the array as much as adjusting the mapping from indices
to elements. Isn't that similar to a rank-change array view,
in particular? Didn't we just decide that it was better
to make a rank-change array view a different type, rather
than folding it in to DefaultRectangular? Why is this different?
Cheers,
-michael
One belated follow-up that I meant to include in the original mail:
An advantage of having 'DefaultRectangular' able to choose between row-
and column-major order (or potentially, eventually, other layouts --
tiling? Morton order?) is that most other domain maps are built in terms
of DefaultRectangular. So (I believe) it would probably simplify our
ability to have other array implementations change their layout. E.g.,
you could imagine having 'Block' take a similar 'param' and thread it
down
into the DefaultRectangulars that it uses to represent each local's block
of data rather than having it switch to a completely different domain
map.
-Brad
On Tue, 11 Jul 2017, Brad Chamberlain wrote:
Hi Apan --
Thanks for continuing to wrestle with these questions of memory layout
which,
I agree, are very important (and increasingly so).
A few people have taken stabs at doing column-major domain maps in
Chapel
over time (e.g., test/arrays/marybeth/CMO_array.chpl was an early
proof-of-concept that has arguably long outlived its utility and I
suspect
isn't really worth poking into beyond noting that it exists). But, as
you've
probably observed, none have made it into the standard modules or
release. I
think it'd be great to change that.
My thought about how to approach this would be to have the
DefaultRectangular
layout take a 'param' argument indicating whether to use row- or
column-major
order in order to minimize software layers and complexity while keeping
that
decision as much in the core array type as possible. I've long
hypothesized
that the changes to the domain map ought to be as simple as adding the
param
and reversing a few of the loops that iterate over the dimensions in
order to
calculate offsets and such. But I could be overlooking something.
Then the remaining challenge would be around how to expose
'DefaultRectangular' to the user so that they could invoke it directly
to
override the default value of that param, say. This would likely
involve a
conversation about naming (DefaultRectangular has always been intended
as an
internal working name rather than something a user would call, but with
an
appropriate choice of name, I think we could/perhaps should expose it
to the
user. I think of this as some wrestling with naming (which is always
surprisingly hard) and minor code shuffling.
(Or maybe we'd just have two instances of DefaultRectangular available
to the
user to use in a 'dmapped' clause if they wanted to be explicit, one
set up
for row-major order and one for column-major?).
Anyway, those are my immediate thoughts -- others' may vary,
-Brad
On Tue, 11 Jul 2017, Qasem, Apan M wrote:
Hello,
I am investigating data layout transformations in Chapel that are
suitable
for heterogeneous memory hierarchies. Organization of data - layout
and
placement - can have a huge impact on the performance of heterogeneous
applications. For example, an AoS data structure allocated in host
(CPU)
memory, should generally be converted to SoA when being mapped to
device
(GPU/accelerator) memory to improve memory coalescing behavior. Below
is a
simple example of two alternate layouts for an array of records for a
GPU
offloaded task (AMD is currently working on providing GPU support in
Chapel). The discrete array implementation yields almost a factor of
two
speedup over the array-of-records implementation on an AMD APU
(integrated
GPU) platform.
record img {
var r: real;
var g: real;
var b: real;
var x: real;
}
var dom: domain(1) = 1 .. N;
var src: [dom] img;
var dst: [dom] img;
on (Locales[0]:LocaleModel).GPU do {
forall i in 1 .. N {
dst[i].r = src[i].r * v0 – src[i].g * v1;
dst[i].x = v1;
}
}
var src_r: [dom] real;
var src_g: [dom] real;
var src_b: [dom] real;
var src_x: [dom] real;
var dst_r: [dom] real;
var dst_g: [dom] real;
var dst_b: [dom] real;
var dst_x: [dom] real;
on (Locales[0]:LocaleModel).GPU do {
forall i in 1 .. N {
dst_r[i] = src_r[i] * v0 - src_g[i] * v1;
dst_x[i] = v1;
}
}
As part of this work, I implemented a domain map module for
column-major
ordering of multi-dimensional arrays. Depending on the data access
patterns, a column-major ordering may be appropriate for device-mapped
data
structures because it can reduce memory divergence in the kernel. It
should
be noted, however, that column major ordering can have a major impact
on
CPU performance as well. For instance, when the innermost loop index
strides through contiguous elements of a column.
Below is an example, of how the column-major layout would be invoked
in a
Chapel program
var domRowMajor : domain(2) = {rows, cols};
var domColMajor : domain(2) dmapped ColMajor() = {rows, cols};
var A : [domRowMajor] real;
var B : [domColMajor] real;
for i in rows {
for j in cols {
A[i,j] = 17.0; // accessed as row-major
B[i,j] = 0.0; // accessed as col-major
}
}
Implementation of ColMajor() draws from the implementation of the
DefaultRectangular() module. Column major ordering is enforced by
re-mapping the indices in the domain. For a 2D array, this is
essentially a
swap of the two dimensions. For higher order arrays, the dimensions
are
reversed such that the nth dimension becomes the innermost (i.e.,
contiguous). The entire code is contained in the file
modules/layouts/ColMajor.chpl.
I would like to make a pull request to integrate the ColMajor() module
into
Chapel. But I wanted to get feedback from the dev community first.
- Apan
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers
--
Jeff Hammond
[email protected]
http://jeffhammond.github.io/
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers