Hello,

I am investigating data layout transformations in Chapel that are suitable for 
heterogeneous memory hierarchies. Organization of data - layout and placement - 
can have a huge impact on the performance of heterogeneous applications. For 
example, an AoS data structure allocated in host (CPU) memory, should generally 
be converted to SoA when being mapped to device (GPU/accelerator) memory to 
improve memory coalescing behavior. Below is a simple example of two alternate 
layouts for an array of records for a GPU offloaded task (AMD is currently 
working on providing GPU support in Chapel). The discrete array implementation 
yields almost a factor of two speedup over the array-of-records implementation 
on an AMD APU (integrated GPU) platform.

record img {
  var r: real;
  var g: real;
  var b: real;
  var x: real;
}

var dom: domain(1) = 1 .. N;

var src: [dom] img;
var dst: [dom] img;

on (Locales[0]:LocaleModel).GPU do {
  forall i in 1 .. N {
    dst[i].r = src[i].r * v0 – src[i].g * v1;
    dst[i].x = v1;
  }
 }


var src_r: [dom] real;
var src_g: [dom] real;
var src_b: [dom] real;
var src_x: [dom] real;

var dst_r: [dom] real;
var dst_g: [dom] real;
var dst_b: [dom] real;
var dst_x: [dom] real;

on (Locales[0]:LocaleModel).GPU do {
  forall i in 1 .. N {
    dst_r[i] = src_r[i] * v0 - src_g[i] * v1;
    dst_x[i] = v1;
  }
 }


As part of this work, I implemented a domain map module for column-major 
ordering of multi-dimensional arrays. Depending on the data access patterns, a 
column-major ordering may be appropriate for device-mapped data structures 
because it can reduce memory divergence in the kernel. It should be noted, 
however, that column major ordering can have a major impact on CPU performance 
as well. For instance, when the innermost  loop index strides through 
contiguous elements of a column.

Below is an example, of how the column-major layout would be invoked in a 
Chapel program



var domRowMajor : domain(2) = {rows, cols};
var domColMajor : domain(2) dmapped ColMajor() = {rows, cols};

var A : [domRowMajor] real;
var B : [domColMajor] real;

for i in rows {
  for j in cols {
    A[i,j] = 17.0; // accessed as row-major
    B[i,j] = 0.0;  // accessed as col-major
  }
}


Implementation of ColMajor() draws from the implementation of the 
DefaultRectangular() module. Column major ordering is enforced by re-mapping 
the indices in the domain. For a 2D array, this is essentially a swap of the 
two dimensions. For higher order arrays, the dimensions are reversed such that 
the nth dimension becomes the innermost (i.e., contiguous). The entire code is 
contained in the file modules/layouts/ColMajor.chpl.

I would like to make a pull request to integrate the ColMajor() module into 
Chapel. But I wanted to get feedback from the dev community first.

- Apan
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to