Hi Apan --
Thanks for continuing to wrestle with these questions of memory layout
which, I agree, are very important (and increasingly so).
A few people have taken stabs at doing column-major domain maps in Chapel
over time (e.g., test/arrays/marybeth/CMO_array.chpl was an early
proof-of-concept that has arguably long outlived its utility and I suspect
isn't really worth poking into beyond noting that it exists). But, as
you've probably observed, none have made it into the standard modules or
release. I think it'd be great to change that.
My thought about how to approach this would be to have the
DefaultRectangular layout take a 'param' argument indicating whether to
use row- or column-major order in order to minimize software layers and
complexity while keeping that decision as much in the core array type as
possible. I've long hypothesized that the changes to the domain map ought
to be as simple as adding the param and reversing a few of the loops that
iterate over the dimensions in order to calculate offsets and such. But I
could be overlooking something.
Then the remaining challenge would be around how to expose
'DefaultRectangular' to the user so that they could invoke it directly to
override the default value of that param, say. This would likely involve a
conversation about naming (DefaultRectangular has always been intended as
an internal working name rather than something a user would call, but with
an appropriate choice of name, I think we could/perhaps should expose it
to the user. I think of this as some wrestling with naming (which is
always surprisingly hard) and minor code shuffling.
(Or maybe we'd just have two instances of DefaultRectangular available to
the user to use in a 'dmapped' clause if they wanted to be explicit, one
set up for row-major order and one for column-major?).
Anyway, those are my immediate thoughts -- others' may vary,
-Brad
On Tue, 11 Jul 2017, Qasem, Apan M wrote:
Hello,
I am investigating data layout transformations in Chapel that are suitable for
heterogeneous memory hierarchies. Organization of data - layout and placement -
can have a huge impact on the performance of heterogeneous applications. For
example, an AoS data structure allocated in host (CPU) memory, should generally
be converted to SoA when being mapped to device (GPU/accelerator) memory to
improve memory coalescing behavior. Below is a simple example of two alternate
layouts for an array of records for a GPU offloaded task (AMD is currently
working on providing GPU support in Chapel). The discrete array implementation
yields almost a factor of two speedup over the array-of-records implementation
on an AMD APU (integrated GPU) platform.
record img {
var r: real;
var g: real;
var b: real;
var x: real;
}
var dom: domain(1) = 1 .. N;
var src: [dom] img;
var dst: [dom] img;
on (Locales[0]:LocaleModel).GPU do {
forall i in 1 .. N {
dst[i].r = src[i].r * v0 – src[i].g * v1;
dst[i].x = v1;
}
}
var src_r: [dom] real;
var src_g: [dom] real;
var src_b: [dom] real;
var src_x: [dom] real;
var dst_r: [dom] real;
var dst_g: [dom] real;
var dst_b: [dom] real;
var dst_x: [dom] real;
on (Locales[0]:LocaleModel).GPU do {
forall i in 1 .. N {
dst_r[i] = src_r[i] * v0 - src_g[i] * v1;
dst_x[i] = v1;
}
}
As part of this work, I implemented a domain map module for column-major
ordering of multi-dimensional arrays. Depending on the data access patterns, a
column-major ordering may be appropriate for device-mapped data structures
because it can reduce memory divergence in the kernel. It should be noted,
however, that column major ordering can have a major impact on CPU performance
as well. For instance, when the innermost loop index strides through
contiguous elements of a column.
Below is an example, of how the column-major layout would be invoked in a
Chapel program
var domRowMajor : domain(2) = {rows, cols};
var domColMajor : domain(2) dmapped ColMajor() = {rows, cols};
var A : [domRowMajor] real;
var B : [domColMajor] real;
for i in rows {
for j in cols {
A[i,j] = 17.0; // accessed as row-major
B[i,j] = 0.0; // accessed as col-major
}
}
Implementation of ColMajor() draws from the implementation of the
DefaultRectangular() module. Column major ordering is enforced by re-mapping
the indices in the domain. For a 2D array, this is essentially a swap of the
two dimensions. For higher order arrays, the dimensions are reversed such that
the nth dimension becomes the innermost (i.e., contiguous). The entire code is
contained in the file modules/layouts/ColMajor.chpl.
I would like to make a pull request to integrate the ColMajor() module into
Chapel. But I wanted to get feedback from the dev community first.
- Apan
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers