Dear Chapel developers,

This is Akihiro Hayashi, postdoc at Rice University.
I'm writing this to ask array copy failure in chapel.

I'm trying to evaluate some chapel benchmark across multiple nodes but I get 
strange error.
Please note that I'm using old version of chapel compiler (r21945) with 
qthread-1.10 and GASNet-1.20.2(infiniband-conduit, mpi-spawner) because the 
latest version does not work.
With the latest version of chapel compiler (r22568) with qthread-1.10 and 
GASNet-1.22.0(infiniband-conduit, mpi-spawner), I get SEGV when running simple 
program (coforall loc in Locales do on loc { writeln(loc); }) across multiple 
nodes with mpi spawner.
This is another problem but I have not investigated this problem yet. I'll work 
on this later.

The following problem might be fixed in the latest version, but any comments 
and suggestions are appreciated.
Here is part of my code. 
The main data structure is a 3-dimensional array, which is declared as a 
distributed array that each of its element refers to a 2-dimension array.
You can see array copy statement (liBlock = lkji_tiles(k,k,k+1).tile_array;) in 
Line 11. I want to use this copy statement because the Chapel compiler 
generates bulk transfer code, which accelerates program execution.

// Code
1: const zero: int(32) = 0;
2: var tile_array_indices = {zero..tileSize-1,zero..tileSize-1};
3: class Tile {
4:    var tile_array: [tile_array_indices] real;
5: }
6: var proto_ijk_space = {zero..numTiles_2-1, zero..numTiles_2, 
zero..numTiles_2};
7: var ijk_space = proto_ijk_space dmapped Block(boundingBox=proto_ijk_space);
8: var lkji_tiles: [ijk_space] Tile;
...
9 :begin {
    ...
10:     var liBlock: [tile_array_indices] real;
11:     liBlock = lkji_tiles(k,k,k+1).tile_array;
12:     for (m,n) in tile_array_indices {
13:     if (liBlock(m,n) != lkji_tiles(k,k,k+1).tile_array(m,n)) {
14:        invalid = true;
15:     }
16:   }
17:   if (invalid) { writln("Copy Failed");}
18:   ...
19: }
...

In my experiment, when running the program on 2 or more locales, the program 
prints "Copy Failed" which means  "liBlock = lkji_tiles(k,k,k+1).tile_array;" 
in Line 11 failed.
This happens sometime (not always). and I confirmed the copy is successfully 
done if I replace the array copy in Line 11 with copy loop.
Additionally, I also see the same behavior when I replace the array copy in 
Line 11 with liBlock._value.doiBulkTransfer(lkji_tiles(k,k,k+1).tile_array);.

Here is an output log at runtime when I compile the program with -s 
debugBulkTransfer (tileSize=200):

-- Log starts here
In DefaultRectangularArr.doiBulkTransfer(): Alo=(0, 0), Blo=(0, 0), len=40000, 
elemSize=8;
-- End of Log

In both cases, the runtime internally calls chpl_comm_get API(*) and the API 
takes the above parameters.
I think it looks good.
(*) Please take a look at doiBulkTransfer function in 
CHPL_HOME/modules/internal/DefaultRectangular.chpl

Any comments and suggestions are appreciated.

Best regards,

Akihiro
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to