Hi -

It may well be the right choice to have DefaultRectangular
be configurable in row-major vs column-major layout, but
I have some reservations about this approach.

First, DefaultRectangular has to support arbitrary dimension.
Is it important to support arbitrary dimension "column-major"
layouts? What about a 3D layout, where only the middle dimension
is stored differently from how it works now? As far as I know,
people interested in "column major" arrays are talking
about 2D arrays / matrices.

Second, I'm not so sure it's a good precedent to put it in
to DefaultRectangular. Won't we prefer to make any new layout a
configurable option in DefaultRectangular for to "minimize software
layers and complexity"? I think it would be preferably to allow
other layouts to exist separately and to solve whatever software
engineering/complexity problems come up. (E.g. giving Block
the ability to select the layout used).

Couldn't we view a Column-major layout as an array that is
similar to an array view? That is, it is not really storing
the array as much as adjusting the mapping from indices
to elements. Isn't that similar to a rank-change array view,
in particular? Didn't we just decide that it was better
to make a rank-change array view a different type, rather
than folding it in to DefaultRectangular? Why is this different?

Cheers,

-michael

>
>One belated follow-up that I meant to include in the original mail:
>
>An advantage of having 'DefaultRectangular' able to choose between row-
>and column-major order (or potentially, eventually, other layouts --
>tiling?  Morton order?) is that most other domain maps are built in terms
>of DefaultRectangular.  So (I believe) it would probably simplify our
>ability to have other array implementations change their layout.  E.g.,
>you could imagine having 'Block' take a similar 'param' and thread it
>down 
>into the DefaultRectangulars that it uses to represent each local's block
>of data rather than having it switch to a completely different domain map.
>
>-Brad
>
>
>On Tue, 11 Jul 2017, Brad Chamberlain wrote:
>
>>
>> Hi Apan --
>>
>> Thanks for continuing to wrestle with these questions of memory layout
>>which, 
>> I agree, are very important (and increasingly so).
>>
>> A few people have taken stabs at doing column-major domain maps in
>>Chapel 
>> over time (e.g., test/arrays/marybeth/CMO_array.chpl was an early
>> proof-of-concept that has arguably long outlived its utility and I
>>suspect 
>> isn't really worth poking into beyond noting that it exists). But, as
>>you've 
>> probably observed, none have made it into the standard modules or
>>release.  I 
>> think it'd be great to change that.
>>
>> My thought about how to approach this would be to have the
>>DefaultRectangular
>> layout take a 'param' argument indicating whether to use row- or
>>column-major 
>> order in order to minimize software layers and complexity while keeping
>>that 
>> decision as much in the core array type as possible. I've long
>>hypothesized 
>> that the changes to the domain map ought to be as simple as adding the
>>param 
>> and reversing a few of the loops that iterate over the dimensions in
>>order to 
>> calculate offsets and such.  But I could be overlooking something.
>>
>> Then the remaining challenge would be around how to expose
>> 'DefaultRectangular' to the user so that they could invoke it directly
>>to 
>> override the default value of that param, say. This would likely
>>involve a 
>> conversation about naming (DefaultRectangular has always been intended
>>as an 
>> internal working name rather than something a user would call, but with
>>an 
>> appropriate choice of name, I think we could/perhaps should expose it
>>to the 
>> user.  I think of this as some wrestling with naming (which is always
>> surprisingly hard) and minor code shuffling.
>>
>> (Or maybe we'd just have two instances of DefaultRectangular available
>>to the 
>> user to use in a 'dmapped' clause if they wanted to be explicit, one
>>set up 
>> for row-major order and one for column-major?).
>>
>> Anyway, those are my immediate thoughts -- others' may vary,
>> -Brad
>>
>>
>> On Tue, 11 Jul 2017, Qasem, Apan M wrote:
>>
>>> Hello,
>>> 
>>> I am investigating data layout transformations in Chapel that are
>>>suitable 
>>> for heterogeneous memory hierarchies. Organization of data - layout
>>>and 
>>> placement - can have a huge impact on the performance of heterogeneous
>>> applications. For example, an AoS data structure allocated in host
>>>(CPU) 
>>> memory, should generally be converted to SoA when being mapped to
>>>device 
>>> (GPU/accelerator) memory to improve memory coalescing behavior. Below
>>>is a 
>>> simple example of two alternate layouts for an array of records for a
>>>GPU 
>>> offloaded task (AMD is currently working on providing GPU support in
>>> Chapel). The discrete array implementation yields almost a factor of
>>>two 
>>> speedup over the array-of-records implementation on an AMD APU
>>>(integrated 
>>> GPU) platform.
>>> 
>>> record img {
>>>  var r: real;
>>>  var g: real;
>>>  var b: real;
>>>  var x: real;
>>> }
>>> 
>>> var dom: domain(1) = 1 .. N;
>>> 
>>> var src: [dom] img;
>>> var dst: [dom] img;
>>> 
>>> on (Locales[0]:LocaleModel).GPU do {
>>>  forall i in 1 .. N {
>>>    dst[i].r = src[i].r * v0 – src[i].g * v1;
>>>    dst[i].x = v1;
>>>  }
>>> }
>>> 
>>> 
>>> var src_r: [dom] real;
>>> var src_g: [dom] real;
>>> var src_b: [dom] real;
>>> var src_x: [dom] real;
>>> 
>>> var dst_r: [dom] real;
>>> var dst_g: [dom] real;
>>> var dst_b: [dom] real;
>>> var dst_x: [dom] real;
>>> 
>>> on (Locales[0]:LocaleModel).GPU do {
>>>  forall i in 1 .. N {
>>>    dst_r[i] = src_r[i] * v0 - src_g[i] * v1;
>>>    dst_x[i] = v1;
>>>  }
>>> }
>>> 
>>> 
>>> As part of this work, I implemented a domain map module for
>>>column-major 
>>> ordering of multi-dimensional arrays. Depending on the data access
>>> patterns, a column-major ordering may be appropriate for device-mapped
>>>data 
>>> structures because it can reduce memory divergence in the kernel. It
>>>should 
>>> be noted, however, that column major ordering can have a major impact
>>>on 
>>> CPU performance as well. For instance, when the innermost  loop index
>>> strides through contiguous elements of a column.
>>> 
>>> Below is an example, of how the column-major layout would be invoked
>>>in a 
>>> Chapel program
>>> 
>>> 
>>> 
>>> var domRowMajor : domain(2) = {rows, cols};
>>> var domColMajor : domain(2) dmapped ColMajor() = {rows, cols};
>>> 
>>> var A : [domRowMajor] real;
>>> var B : [domColMajor] real;
>>> 
>>> for i in rows {
>>>  for j in cols {
>>>    A[i,j] = 17.0; // accessed as row-major
>>>    B[i,j] = 0.0;  // accessed as col-major
>>>  }
>>> }
>>> 
>>> 
>>> Implementation of ColMajor() draws from the implementation of the
>>> DefaultRectangular() module. Column major ordering is enforced by
>>> re-mapping the indices in the domain. For a 2D array, this is
>>>essentially a 
>>> swap of the two dimensions. For higher order arrays, the dimensions
>>>are 
>>> reversed such that the nth dimension becomes the innermost (i.e.,
>>> contiguous). The entire code is contained in the file
>>> modules/layouts/ColMajor.chpl.
>>> 
>>> I would like to make a pull request to integrate the ColMajor() module
>>>into 
>>> Chapel. But I wanted to get feedback from the dev community first.
>>> 
>>> - Apan

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to