[PATCH] D79972: [OpenMP5.0] map item can be non-contiguous for target update

Chi Chun Chen via Phabricator via cfe-commits Tue, 02 Jun 2020 16:30:46 -0700

cchen added a comment.

In D79972#2069435 <https://reviews.llvm.org/D79972#2069435>, @ABataev wrote:

> In D79972#2069366 <https://reviews.llvm.org/D79972#2069366>, @cchen wrote:
>
> > In D79972#2069358 <https://reviews.llvm.org/D79972#2069358>, @ABataev wrote:
> >
> > > In D79972#2069322 <https://reviews.llvm.org/D79972#2069322>, @cchen wrote:
> > >
> > > > In D79972#2068976 <https://reviews.llvm.org/D79972#2068976>, @ABataev 
> > > > wrote:
> > > >
> > > > > Still: Did you think about implementing it in the compiler instead of 
> > > > > the runtime?
> > > >
> > > >
> > > > I'm not sure I understand your question, which part of code are you 
> > > > asking?
> > > >  The main work compiler needs to do is to send the {offset, count, 
> > > > stride} struct to runtime.
> > >
> > >
> > > I mean did you think about calling `__tgt_target_data_update` function in 
> > > a loop in the compiler-generated code instead of putting it into the 
> > > runtime?
> >
> >
> > Oh, I would prefer to call `tgt_target_data_update` once in the compiler 
> > and I'm also doing it now.
>
>
> I was not quite correct. What I mean, is to generate the array with the array 
> section as VLA in the compiler, and fill it in the loop generated by the 
> compiler for non-contiguous sections but not in the runtime?
>  Say, we have the code:
>
>   int arr[3][3]
>   ...
>    #pragma omp update to(arr[1:2][1:2]
>  
>
>
> In this case, we're going to transfer the next elements:
>
>   000
>   0xx
>   0xx
>
>
> In the compiler-generated code we emit something like this:
>
>   void *bptr[<n>];
>   void *ptr[<n>];
>   int64 sizes[<n>];
>   int64 maptypes[<n>];
>   for (int i = 0; i < <n>; ++i) {
>     bptr[i] = &arr[1+i][1];
>     ptr[i] = &arr[1+i][1];
>     sizes[i] = ...;'
>     maptypes[i] = ...;
>   }
>   call void @__tgt_target_data_update(i64 -1, i32 <n>, bptr, ptr, sizes, 
> maptypes);
>
>
> With this solution, you won't need to modify the runtime and add a new 
> mapping flag.

For my current implementation, we have discussed in the bi-weekly meeting 
several weeks back, and there was a general consensus that it was an acceptable 
approach.

The major advantage of sending a descriptor to runtime can be elaborated in the 
following example:

  #define N 10000
  int a[N][2];
  …
  #pragma amp target update to (a[0:N][0:1])

This would require passing through O(N) entries in the tgt_target_data_update 
call, or 10000 entries. The current implementation only require a descriptor 
with 2 entries. I think this could be a real concern -
splitting out the transfers in compiler-generated code results in a list 
containing one entry per non-contiguous chunk (easily hitting scaling issues), 
while the descriptor approach is bounded by the number of dimensions.
That seems like a pretty compelling reason to use the descriptor - it’s much 
more space efficient.

Also, the descriptor idea is very similar to how Cray supported Fortran dope 
vectors for years (we send in a pointer to a dope vector rather than a pointer 
to the data, and a flag to indicate it’s a dope vector, and the runtime library 
handles it as a dope vector).
I think the runtime library changes will not be very extensive or difficult at 
all and we’re very willing to implement the runtime for non-contiguous.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79972/new/

https://reviews.llvm.org/D79972

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79972: [OpenMP5.0] map item can be non-contiguous for target update

Reply via email to