One belated follow-up that I meant to include in the original mail:
An advantage of having 'DefaultRectangular' able to choose between row-
and column-major order (or potentially, eventually, other layouts --
tiling? Morton order?) is that most other domain maps are built in terms
of DefaultRectangular. So (I believe) it would probably simplify our
ability to have other array implementations change their layout. E.g.,
you could imagine having 'Block' take a similar 'param' and thread it down
into the DefaultRectangulars that it uses to represent each local's block
of data rather than having it switch to a completely different domain map.
-Brad
On Tue, 11 Jul 2017, Brad Chamberlain wrote:
Hi Apan --
Thanks for continuing to wrestle with these questions of memory layout which,
I agree, are very important (and increasingly so).
A few people have taken stabs at doing column-major domain maps in Chapel
over time (e.g., test/arrays/marybeth/CMO_array.chpl was an early
proof-of-concept that has arguably long outlived its utility and I suspect
isn't really worth poking into beyond noting that it exists). But, as you've
probably observed, none have made it into the standard modules or release. I
think it'd be great to change that.
My thought about how to approach this would be to have the DefaultRectangular
layout take a 'param' argument indicating whether to use row- or column-major
order in order to minimize software layers and complexity while keeping that
decision as much in the core array type as possible. I've long hypothesized
that the changes to the domain map ought to be as simple as adding the param
and reversing a few of the loops that iterate over the dimensions in order to
calculate offsets and such. But I could be overlooking something.
Then the remaining challenge would be around how to expose
'DefaultRectangular' to the user so that they could invoke it directly to
override the default value of that param, say. This would likely involve a
conversation about naming (DefaultRectangular has always been intended as an
internal working name rather than something a user would call, but with an
appropriate choice of name, I think we could/perhaps should expose it to the
user. I think of this as some wrestling with naming (which is always
surprisingly hard) and minor code shuffling.
(Or maybe we'd just have two instances of DefaultRectangular available to the
user to use in a 'dmapped' clause if they wanted to be explicit, one set up
for row-major order and one for column-major?).
Anyway, those are my immediate thoughts -- others' may vary,
-Brad
On Tue, 11 Jul 2017, Qasem, Apan M wrote:
Hello,
I am investigating data layout transformations in Chapel that are suitable
for heterogeneous memory hierarchies. Organization of data - layout and
placement - can have a huge impact on the performance of heterogeneous
applications. For example, an AoS data structure allocated in host (CPU)
memory, should generally be converted to SoA when being mapped to device
(GPU/accelerator) memory to improve memory coalescing behavior. Below is a
simple example of two alternate layouts for an array of records for a GPU
offloaded task (AMD is currently working on providing GPU support in
Chapel). The discrete array implementation yields almost a factor of two
speedup over the array-of-records implementation on an AMD APU (integrated
GPU) platform.
record img {
var r: real;
var g: real;
var b: real;
var x: real;
}
var dom: domain(1) = 1 .. N;
var src: [dom] img;
var dst: [dom] img;
on (Locales[0]:LocaleModel).GPU do {
forall i in 1 .. N {
dst[i].r = src[i].r * v0 – src[i].g * v1;
dst[i].x = v1;
}
}
var src_r: [dom] real;
var src_g: [dom] real;
var src_b: [dom] real;
var src_x: [dom] real;
var dst_r: [dom] real;
var dst_g: [dom] real;
var dst_b: [dom] real;
var dst_x: [dom] real;
on (Locales[0]:LocaleModel).GPU do {
forall i in 1 .. N {
dst_r[i] = src_r[i] * v0 - src_g[i] * v1;
dst_x[i] = v1;
}
}
As part of this work, I implemented a domain map module for column-major
ordering of multi-dimensional arrays. Depending on the data access
patterns, a column-major ordering may be appropriate for device-mapped data
structures because it can reduce memory divergence in the kernel. It should
be noted, however, that column major ordering can have a major impact on
CPU performance as well. For instance, when the innermost loop index
strides through contiguous elements of a column.
Below is an example, of how the column-major layout would be invoked in a
Chapel program
var domRowMajor : domain(2) = {rows, cols};
var domColMajor : domain(2) dmapped ColMajor() = {rows, cols};
var A : [domRowMajor] real;
var B : [domColMajor] real;
for i in rows {
for j in cols {
A[i,j] = 17.0; // accessed as row-major
B[i,j] = 0.0; // accessed as col-major
}
}
Implementation of ColMajor() draws from the implementation of the
DefaultRectangular() module. Column major ordering is enforced by
re-mapping the indices in the domain. For a 2D array, this is essentially a
swap of the two dimensions. For higher order arrays, the dimensions are
reversed such that the nth dimension becomes the innermost (i.e.,
contiguous). The entire code is contained in the file
modules/layouts/ColMajor.chpl.
I would like to make a pull request to integrate the ColMajor() module into
Chapel. But I wanted to get feedback from the dev community first.
- Apan
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers