Hi Aroon and all,
El 19/12/2014, a las 03:57, Aroon Sharma <[email protected]> escribió:
> With respect to the work that was done by the University of Malaga, our work
> applies bulk transfers to more generic zippered loops that zip Cyclic and
> Block Cyclic array slices. From what I remember, their work was restricted to
> whole array assignments between Block and Cyclic arrays (i.e A = B where B is
> Block and A is Cyclic). Since whole array assignment and zippered iteration
> are fundamentally related, I think there is a lot of overlap between both
> works. In fact, our implementation uses a strided communication primitive
> that they developed.
As stated in this paper:
http://www.ac.uma.es/~compilacion/publicaciones/UMA-DAC-12-02.pdf
array assignments do not necessary need to assign whole arrays to benefit from
the bulk transfer optimization. More precisely, we aggregate data for
assignments of the form:
A[Da] = B[Db] where,
- A is a Block or Cyclic array,
- B is a Block or Cyclic array,
- Da is of the form {xa1..ya1 by za1, xa2..ya2 by za2, …, xan..yan by
zan} and
- Db is of the form {xb1..yb1 by zb1, xb2..yb2 by zb2, …, xbn..ybn by
zbn}.
That way, this optimization covers block-to-block, cyclic-to-cyclic,
block-to-cyclic and cyclic-to-block kind of assignments. It is not set by
default, so -s useBulkTransferStride has to be specified to enable this
optimization.
> Our work, for example, can aggregate something like:
>
> forall (a, b, c) in zip(A[1..100], B[2..101], C[3..102]) {
> a = b + c;
> }
>
> where A, B, and C are all Cyclic. Because different array slices are
> referenced in the zippering, a, b, and c will be from different locales on
> all iterations of the loop. I don't believe that the work by the University
> of Malaga could be applied to situations like this.
We have certainly not tackle the problem you describe above. However, and
thinking offhand, with our optimization the required slices of B and C (in your
example) could be moved to temporary arrays on the locales owning the
corresponding slice of A, and then do the local computation. If you implement
further optimizations like overlapping communications and computations,
minimizing data movement or memory footprint, or the like, then our work is not
directly applicable to the situation you describe.
Regards,
Rafa.
__
Rafael Asenjo Plaza
Dept. Arquitectura de Computadores
Complejo Tecnologico Campus de Teatinos
E-29071 MALAGA (SPAIN)
Tel: +34 95 213 27 91
Fax: +34 95 213 27 90
http://www.ac.uma.es/~asenjo
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers