Hi Apan --

Thanks for continuing to wrestle with these questions of memory layout which, I agree, are very important (and increasingly so).

A few people have taken stabs at doing column-major domain maps in Chapel over time (e.g., test/arrays/marybeth/CMO_array.chpl was an early proof-of-concept that has arguably long outlived its utility and I suspect isn't really worth poking into beyond noting that it exists). But, as you've probably observed, none have made it into the standard modules or release. I think it'd be great to change that.

My thought about how to approach this would be to have the DefaultRectangular layout take a 'param' argument indicating whether to use row- or column-major order in order to minimize software layers and complexity while keeping that decision as much in the core array type as possible. I've long hypothesized that the changes to the domain map ought to be as simple as adding the param and reversing a few of the loops that iterate over the dimensions in order to calculate offsets and such. But I could be overlooking something.

Then the remaining challenge would be around how to expose 'DefaultRectangular' to the user so that they could invoke it directly to override the default value of that param, say. This would likely involve a conversation about naming (DefaultRectangular has always been intended as an internal working name rather than something a user would call, but with an appropriate choice of name, I think we could/perhaps should expose it to the user. I think of this as some wrestling with naming (which is always surprisingly hard) and minor code shuffling.

(Or maybe we'd just have two instances of DefaultRectangular available to the user to use in a 'dmapped' clause if they wanted to be explicit, one set up for row-major order and one for column-major?).

Anyway, those are my immediate thoughts -- others' may vary,
-Brad


On Tue, 11 Jul 2017, Qasem, Apan M wrote:

Hello,

I am investigating data layout transformations in Chapel that are suitable for 
heterogeneous memory hierarchies. Organization of data - layout and placement - 
can have a huge impact on the performance of heterogeneous applications. For 
example, an AoS data structure allocated in host (CPU) memory, should generally 
be converted to SoA when being mapped to device (GPU/accelerator) memory to 
improve memory coalescing behavior. Below is a simple example of two alternate 
layouts for an array of records for a GPU offloaded task (AMD is currently 
working on providing GPU support in Chapel). The discrete array implementation 
yields almost a factor of two speedup over the array-of-records implementation 
on an AMD APU (integrated GPU) platform.

record img {
 var r: real;
 var g: real;
 var b: real;
 var x: real;
}

var dom: domain(1) = 1 .. N;

var src: [dom] img;
var dst: [dom] img;

on (Locales[0]:LocaleModel).GPU do {
 forall i in 1 .. N {
   dst[i].r = src[i].r * v0 – src[i].g * v1;
   dst[i].x = v1;
 }
}


var src_r: [dom] real;
var src_g: [dom] real;
var src_b: [dom] real;
var src_x: [dom] real;

var dst_r: [dom] real;
var dst_g: [dom] real;
var dst_b: [dom] real;
var dst_x: [dom] real;

on (Locales[0]:LocaleModel).GPU do {
 forall i in 1 .. N {
   dst_r[i] = src_r[i] * v0 - src_g[i] * v1;
   dst_x[i] = v1;
 }
}


As part of this work, I implemented a domain map module for column-major 
ordering of multi-dimensional arrays. Depending on the data access patterns, a 
column-major ordering may be appropriate for device-mapped data structures 
because it can reduce memory divergence in the kernel. It should be noted, 
however, that column major ordering can have a major impact on CPU performance 
as well. For instance, when the innermost  loop index strides through 
contiguous elements of a column.

Below is an example, of how the column-major layout would be invoked in a 
Chapel program



var domRowMajor : domain(2) = {rows, cols};
var domColMajor : domain(2) dmapped ColMajor() = {rows, cols};

var A : [domRowMajor] real;
var B : [domColMajor] real;

for i in rows {
 for j in cols {
   A[i,j] = 17.0; // accessed as row-major
   B[i,j] = 0.0;  // accessed as col-major
 }
}


Implementation of ColMajor() draws from the implementation of the 
DefaultRectangular() module. Column major ordering is enforced by re-mapping 
the indices in the domain. For a 2D array, this is essentially a swap of the 
two dimensions. For higher order arrays, the dimensions are reversed such that 
the nth dimension becomes the innermost (i.e., contiguous). The entire code is 
contained in the file modules/layouts/ColMajor.chpl.

I would like to make a pull request to integrate the ColMajor() module into 
Chapel. But I wanted to get feedback from the dev community first.

- Apan
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to