Hi Chung-Lin!

On Mon, 13 Feb 2017 18:13:42 +0800, Chung-Lin Tang <chunglin_t...@mentor.com> 
wrote:
> This patch adds:
> 
> // New functions to set/get the current default async queue
> void acc_set_default_async (int);
> int acc_get_default_async (void);
> 
> and _async versions of a few existing API functions:

(Please, separate patches for separate features/changes.)


Reviewing the OpenACC ICV acc-default-async-var changes here.

> --- include/gomp-constants.h  (revision 245382)
> +++ include/gomp-constants.h  (working copy)

>  /* Asynchronous behavior.  Keep in sync with
>     libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
>  
> +#define GOMP_ASYNC_DEFAULT           0
>  #define GOMP_ASYNC_NOVAL             -1
>  #define GOMP_ASYNC_SYNC                      -2

This means that "acc_set_default_async(acc_async_default)" will set
acc-default-async-var to "0", that is, the same as
"acc_set_default_async(0)".  It thus follows that
"async"/"async(acc_async_noval)" is the same as "async(0)".  Is that
intentional?

It is in line with the OpenACC 2.5 specification: "initial value [...] is
implementation defined", but I wonder why map it to "async(0)", and not
to its own, unspecified, but separate queue.  In the latter case,
"acc_async_default" etc. would then map to a negative value to denote
this unspecified, but separate queue (and your changes would need to be
adapted for that).

I have not verified whether we're currently already having (on trunk
and/or openacc-gcc-8-branch) the semantics of the queue of
"async(acc_async_noval)" mapping to the same queue as "async(0)"?

I'm fine to accept your changes as proposed (basically, everthing from
your patch posted that has a "default_async" in it), for that's an
incremental improvement anyway.  But -- unless you tell me I've
misunderstood something -- I'll get the issue I raised clarified with the
OpenACC technical committee, and we will then later improve this further.

No matter what the outcome, the implementation-defined behavior should be
documented.  (Can do that once we get the intentions clarified.)

> --- libgomp/oacc-async.c      (revision 245382)
> +++ libgomp/oacc-async.c      (working copy)

> +int
> +acc_get_default_async (void)
> +{
> +  struct goacc_thread *thr = goacc_thread ();
> +
> +  if (!thr || !thr->dev)
> +    gomp_fatal ("no device active");

I suppose that instead, this might also either just "return
acc_async_sync", or in fact "goacc_lazy_initialize", and then return the
correct value?  As far as I remember now, I have an issue open with the
OpenACC technical committee to clarify which constructs/API calls are
expected to implicitly initialize.  I'll fold this question in.

So, OK to leave 'gomp_fatal ("no device active")', as that's what all
other async routines also seem to be doing at the moment.

> +
> +  return thr->default_async;
> +}
> +

> +void
> +acc_set_default_async (int async)
> +{
> +  if (async < acc_async_sync)
> +    gomp_fatal ("invalid async argument: %d", async);

(This will nowadays use "async_valid_stream_id_p" or some such.)

> +
> +  struct goacc_thread *thr = goacc_thread ();
> +
> +  if (!thr || !thr->dev)
> +    gomp_fatal ("no device active");

As above.

> +  thr->default_async = async;
> +}

> --- libgomp/oacc-plugin.c     (revision 245382)
> +++ libgomp/oacc-plugin.c     (working copy)

> +/* Return the default async number from the TLS data for the current thread. 
>  */
> +
> +int
> +GOMP_PLUGIN_acc_thread_default_async (void)
> +{
> +  struct goacc_thread *thr = goacc_thread ();
> +  return thr ? thr->default_async : acc_async_default;
> +}

As I understand, the need for this function will disappear with your
later "async re-work" changes, so OK as posted, but I wondered in which
cases we would not have a valid "goacc_thread" when coming here?  (Might
again related to the "goacc_lazy_initialize" issue mentioned above.)

> --- libgomp/plugin/plugin-nvptx.c     (revision 245382)
> +++ libgomp/plugin/plugin-nvptx.c     (working copy)
> @@ -414,13 +414,10 @@ select_stream_for_async (int async, pthread_t thre
>    struct ptx_stream *stream = NULL;
>    int orig_async = async;
>  
> -  /* The special value acc_async_noval (-1) maps (for now) to an
> -     implicitly-created stream, which is then handled the same as any other
> -     numbered async stream.  Other options are available, e.g. using the null
> -     stream for anonymous async operations, or choosing an idle stream from 
> an
> -     active set.  But, stick with this for now.  */
> -  if (async > acc_async_sync)
> -    async++;

Is that actually a separate change from the acc-default-async-var
changes?

Is this one relevant in the question raised above, whether
"async(acc_async_noval)" maps to the same queue as "async(0)"?

> +  /* The special value acc_async_noval (-1) maps to the thread-specific
> +     default async stream.  */
> +  if (async == acc_async_noval)
> +    async = GOMP_PLUGIN_acc_thread_default_async ();

> --- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c (revision 0)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c (revision 0)
> @@ -0,0 +1,904 @@
> +[...]
> +    acc_set_default_async (s);
> +[...]

This is the one single test case using this functionality, but it only
verifies "correct results', but doesn't observe the actual queues (for
example, CUDA streams) being used.

Need test cases for "acc_get_default_async", too, and also Fortran ones.


Generally, I envision test cases running a few "acc_get_cuda_stream"
calls with relevant argument values, to see whether the expected
queues/streames are being used.  (Similar for other offload targets.)

But I suppose we might again need to get clarified whether
"acc_get_cuda_stream(acc_async_sync)",
"acc_get_cuda_stream(acc_async_noval)", or
"acc_get_cuda_stream(acc_async_default)" are actually valid calls (given
that these argument values are not valid "async value"s), and these would
then return the respective CUDA stream handles, different from the one
returned for "acc_get_cuda_stream(0)" etc.

That said, we can certainly implement it that way, because that's not
against the specification.

(Once available in trunk, we can also construct test cases using the
OpenACC Profiling Interface for verifying such internal mapping.)


Grüße
 Thomas

Reply via email to