[PATCH, OpenACC 2.7, v2] Implement reductions for arrays and structs

2024-06-06 Thread Chung-Lin Tang
Hi Thomas,
This is v2 of the C/C++/middle-end parts of array/struct
support for OpenACC reductions.

The main changes are much fixed support for sub-arrays,
and some new testcases.

Tested on mainline using x86_64 host and nvptx/amdgcn offloading.
Will backport to upcoming omp/devel/gcc-14 branch after approved for mainline.

Thanks,
Chung-Lin

2024-06-06  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* c-typeck.cc (c_oacc_reduction_defined_type_p): New function.
(c_oacc_reduction_code_name): Likewise.
(c_finish_omp_clauses): Handle OpenACC cases using new functions.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* semantics.cc (cp_oacc_reduction_defined_type_p): New function.
(cp_oacc_reduction_code_name): Likewise.
(finish_omp_reduction_clause): Handle OpenACC cases using new
functions.

gcc/ChangeLog:

* config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for
handling ARRAY_TYPE and RECORD_TYPE reductions.
(gcn_goacc_reduction_setup): Likewise.
(gcn_goacc_reduction_init): Likewise.
(gcn_goacc_reduction_fini): Likewise.
(gcn_goacc_reduction_teardown): Likewise.

* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate
V2SI shuffle using vec_extract op.
(nvptx_get_shared_red_addr): Adjust type/alignment calculations to
use TYPE_SIZE/ALIGN_UNIT instead of machine mode based.
(nvptx_reduction_update): Additions for handling ARRAY_TYPE and
RECORD_TYPE reductions.
(nvptx_goacc_reduction_setup): Likewise.
(nvptx_goacc_reduction_init): Likewise.
(nvptx_goacc_reduction_fini): Likewise.
(nvptx_goacc_reduction_teardown): Likewise.

* gimplify.cc (gimplify_scan_omp_clauses): Sanity checking for
supported array reduction cases.
(gimplify_adjust_omp_clauses): Peel away array MEM_REF for decl lookup.

* omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type
building to use decl type, rather than generic ptr_type_node.
(omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op
construction.
(lower_rec_input_clauses): Set OMP_CLAUSE_REDUCTION_PRIVATE_EXPR.
(oacc_array_reduction_bias): New function.
(lower_oacc_reductions): Add code to teardown/recover array access
MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements.
Use OMP_CLAUSE_REDUCTION_PRIVATE_EXPR as reduction private copy if set.
Handle array reductions using new oacc_array_reduction_bias function.
Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT
instead of machine mode based.

* omp-oacc-neuter-broadcast.cc (worker_single_copy):
Add 'hash_set *array_reduction_base_vars' parameter.
Add xxx.

(neuter_worker_single): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust recursive calls to self and worker_single_copy.
(oacc_do_neutering): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust call to neuter_worker_single.
(execute_omp_oacc_neuter_broadcast): Add local
'hash_set array_reduction_base_vars' declaration. Collect MEM_REF
base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add
'_reduction_base_vars' argument to call of oacc_do_neutering.

* omp-offload.cc (default_goacc_reduction): Add unshare_expr.

* tree.cc (omp_clause_num_ops): Increase OMP_CLAUSE_REDUCTION ops to 6.
* tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_EXPR): New macro.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/reduction-9.c: New test.
* c-c++-common/goacc/reduction-10.c: New test.
* c-c++-common/goacc/reduction-11.c: New test.
* c-c++-common/goacc/reduction-12.c: New test.
* c-c++-common/goacc/reduction-13.c: New test.
* c-c++-common/goacc/reduction-14.c: New test.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction.h
(check_reduction_array_xx): New macro.
(operator_apply): Likewise.
(check_reduction_array_op): Likewise.
(check_reduction_arraysec_op): Likewise.
(function_apply): Likewise.
(check_reduction_array_macro): Likewise.
(check_reduction_arraysec_macro): Likewise.
(check_reduction_xxx_xx_all): Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c: New test.
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 2d9e9c0969f..61991a218f8 100644

Re: [PATCH, OpenACC 2.7, v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2024-04-16 Thread Chung-Lin Tang
On 2024/4/12 3:14 PM, Thomas Schwinge wrote:
>> I have re-tested the patch *without* the gomp_increment/decrement_refcount 
>> changes,
>> and have these regressions (just to demonstrate what is affected):
>> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
>> execution test
>> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
>> execution test
>> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
>> execution test
>> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
>> execution test
>> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
>> execution test
>> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
>> execution test
>> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
>> execution test
>> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c 
>> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
>> execution test
> ... are cases where we 'acc_map_data' something, and then invoke an
> OpenACC compute constuct with a data clause for the same memory region...
> 
>> Now, I have also re-tested your version (aka, just break early and return 
>> when k->refcount == REFCOUNT_ACC_MAP_DATA)
>> And for the record, that also works (no regressions).
>>
>> However, I strongly suggest we use my version here where we adjust the 
>> dynamic_refcount
> ..., and it's confusing to me why such an OpenACC compute constuct (which
> is to use the structured reference counter) should then use the dynamic
> reference counter, for 'acc_map_data'-mapped data?
> 
>> simply because: *It is the whole point of this project item in OpenACC 2.7*
>>
>> The 2.7 spec articulated how increment/decrement interacts with 
>> acc_map_data/acc_unmap_data and this patch was supposed to make libgomp more 
>> conforming to it implementation-wise.
>> (otherwise, no point in working on this at all, as there wasn't really 
>> anything behaviorally wrong about our implementation before)
> That is, in my understanding, those 'gomp_increment_refcount' changes
> don't affect the 'acc_map_data' reference counting, but instead, they
> change the reference counting for OpenACC constructs that are originally
> using structured reference counter to instead use the dynamic reference
> counter.  This doesn't seem conceptually right to me.  (..., even if not
> observable from the outside.)

Okay, I've committed the attached patch, with the "early return upon 
k->refcount == REFCOUNT_ACC_MAP_DATA" in gomp_increment/decrement_refcount.

If we continue to use k->refcount itself as the flag holder of map type, I 
guess we will not be able to directly determine whether it is a
structured or dynamic adjustment at that point. Probably need a new field 
entirely. I think we don't really need to do that right now.

Thanks,
Chung-Lin
From a7578a077ed8b64b94282aa55faf7037690abbc5 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 16 Apr 2024 09:03:21 +
Subject: [PATCH] OpenACC 2.7: Adjust acc_map_data/acc_unmap_data interaction
 with reference counters

This patch adjusts the implementation of acc_map_data/acc_unmap_data API library
routines to more fit the description in the OpenACC 2.7 specification.

Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA
special value to mark acc_map_data-created mappings. Adjustment around
mapping related code to respect OpenACC semantics are also added.

libgomp/ChangeLog:

* libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2).
* oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
initialize dynamic_refcount as 1.
(acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
(goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case.
(goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect
REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
(goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case.
* target.c (gomp_increment_refcount): Return early for
REFCOUNT_ACC_M

[PATCH, OpenACC 2.7, v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2024-04-11 Thread Chung-Lin Tang
Hi Thomas,

On 2024/3/15 7:24 PM, Thomas Schwinge wrote:
> Hi Chung-Lin!
> 
> I realized: please add "PR libgomp/92840" to the Git commit log, as your
> changes are directly a continuation of my earlier changes.

Okay, I'll remember to do that.

...
> -  if (n->refcount != REFCOUNT_INFINITY)
> +  if (n->refcount != REFCOUNT_INFINITY
> +   && n->refcount != REFCOUNT_ACC_MAP_DATA)
>   n->refcount--;
>n->dynamic_refcount--;
>  }
>  
> +  /* Mappings created by 'acc_map_data' may only be deleted by
> + 'acc_unmap_data'.  */
> +  if (n->refcount == REFCOUNT_ACC_MAP_DATA
> +  && n->dynamic_refcount == 0)
> +n->dynamic_refcount = 1;
> +
>if (n->refcount == 0)
>  {
>bool copyout = (kind == GOMP_MAP_FROM
> 
> ..., which really should have the same semantics?  No strong opinion on
> which of the two variants you now chose.

My guess is that breaking off the REFCOUNT_ACC_MAP_DATA case separately will
be lighter on any branch predictors (faster performing overall), so I will
stick with my version here.


>>>
>>> It's not clear to me why you need this handling -- instead of just
>>> handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is,
>>> early 'return'?
>>>
>>> Per my understanding, this code is for OpenACC only exercised for
>>> structured data regions, and it seems strange (unnecessary?) to adjust
>>> the 'dynamic_refcount' for these for 'acc_map_data'-mapped data?  Or am I
>>> missing anything?
>>
>> No, that is not true. It goes through almost everything through 
>> gomp_map_vars_existing/_internal.
>> This is what happens when you acc_create/acc_copyin on a mapping created by 
>> acc_map_data.
> 
> But I don't understand what you foresee breaking with the following (on
> top of your v2):
> 
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -476,14 +476,14 @@ gomp_free_device_memory (struct gomp_device_descr 
> *devicep, void *devptr)
>  static inline void
>  gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set)
>  {
> -  if (k == NULL || k->refcount == REFCOUNT_INFINITY)
> +  if (k == NULL
> +  || k->refcount == REFCOUNT_INFINITY
> +  || k->refcount == REFCOUNT_ACC_MAP_DATA)
>  return;
>  
>uintptr_t *refcount_ptr = >refcount;
>  
> -  if (k->refcount == REFCOUNT_ACC_MAP_DATA)
> -refcount_ptr = >dynamic_refcount;
> -  else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
> +  if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>  refcount_ptr = >structelem_refcount;
...
> Can you please show a test case?

I have re-tested the patch *without* the gomp_increment/decrement_refcount 
changes,
and have these regressions (just to demonstrate what is affected):
+FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
execution test
+FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
execution test
+FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
execution test
+FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
execution test
+FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
execution test
+FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
execution test
+FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
execution test
+FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
execution test

Now, I have also re-tested your version (aka, just break early and return when 
k->refcount == REFCOUNT_ACC_MAP_DATA)
And for the record, that also works (no regressions).

However, I strongly suggest we use my version here where we adjust the 
dynamic_refcount, simply because: *It is the whole point of this project item 
in OpenACC 2.7*

The 2.7 spec articulated how increment/decrement interacts with 
acc_map_data/acc_unmap_data and this patch was supposed to make libgomp more 
conforming to it implementation-wise.
(otherwise, no point in working on this at all, as there wasn't really anything 
behaviorally wrong about our implementation before)

> I see we already have:
> 
> if ((kinds[i] & 0xff) == GOMP_MAP_TO_PSET
> && tgt->list_count == 0)
>   {
> /* 'declare target'.  */
> assert (n->refcount == REFCOUNT_INFINITY);
> 
> I think I wanted to you to add:
> 
> --- 

Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2024-04-03 Thread Chung-Lin Tang
Hi Richard, Thomas,

On 2023/10/30 8:46 PM, Richard Biener wrote:
>>
>> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
>> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
>> flag.
>>
>> The actual optimization then is done in this second patch.  Chung-Lin
>> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
>> I don't have much experience with most of the following generic code, so
>> would appreciate a helping hand, whether that conceptually makes sense as
>> well as from the implementation point of view:

First of all, I have removed all of the gimplify-stage scanning and setting of
DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes to
gimplify.cc now)

I remember this code was an artifact of earlier attempts to allow struct-member
pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed 
anyways.
I think the omp_data_* member accesses when building child function side
receiver_refs is blocking points-to analysis from working (didn't try digging 
deeper)

Also during gimplify, VAR_DECLs appeared to be reused (at least in some cases) 
for map
clause decl reference building, so hoping that the variables "happen to be" 
single-use and
DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does 
appear to be
a little risky.

However, for firstprivate pointers processed during omp-low, it appears to be 
somewhat different.
(see below description)

> No, I don't think you can use that flag on non-default-defs, nor
> preserve it on copying.  So
> it also doesn't nicely extend to DECLs as done by the patch.  We
> currently _only_ use it
> for incoming parameters.  When used on arbitrary code you can get to for 
> example
> 
> ptr1(points-to-readony-memory) = >x;
> ... access via ptr1 ...
> ptr2 = >x;
> ... access via ptr2 ...
> 
> where both are your OMP regions differently constrained (the constrain is on 
> the
> code in the region, _not_ on the actual protections of the pointed to
> data, much like
> for the fortran case).  But now CSE comes along and happily replaces all ptr2
> with ptr2 in the second region and ... oops!

Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in 
the second region"?

That doesn't happen, because during omp-lower/expand, OMP target regions (which 
is all that
this applies currently) is separated into different individual child functions.

(Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during 
omp-lower, when
for firstprivate pointers (i.e. 'a' here) we set this bit when constructing the 
first load
of this pointer)

  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
  {
foo (a, a[8]);
r = a[8];
  }
  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
  {
foo (a, a[12]);
r = a[12];
  }

After omp-expand (before SSA):

__attribute__((oacc parallel, omp target entrypoint, noclone))
void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
{
 ...
   :
  D.2962 = .omp_data_i->D.2947;
  a.8 = D.2962;
  r.1 = (*a.8)[12];
  foo (a.8, r.1);
  r.1 = (*a.8)[12];
  D.2965 = .omp_data_i->r;
  *D.2965 = r.1;
  return;
}

__attribute__((oacc parallel, omp target entrypoint, noclone))
void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
{
  ...
   :
  D.2968 = .omp_data_i->D.2939;
  a.4 = D.2968;
  r.0 = (*a.4)[8];
  foo (a.4, r.0);
  r.0 = (*a.4)[8];
  D.2971 = .omp_data_i->r;
  *D.2971 = r.0;
  return;
}

So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to
SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a 
default-def
for an PARM_DECL, at least conceptually.

(If offloading was structured significantly differently, say if child functions
were separated much earlier before omp-lowering, than this readonly-modifier 
might
possibly be a direct application of 'r' in the "fn spec" attribute)

Other changes since first version of patch include:
1) update of C/C++ FE changes to new style in c-family/c-omp.cc
2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested
3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to verify 
removal of a MEM load, also as Thomas suggested.

I have re-tested this patch using mainline, with no regressions. Is this okay 
for mainline?

Thanks,
Chung-Lin

2024-04-03  Chung-Lin Tang  

gcc/c-family/ChangeLog:

* c-omp.cc (c_omp_address_inspector::expand_array_base):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
(c_omp_address_inspector::expand_component_selector): Likewise.

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_trans_omp_array_section):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/Change

Re: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends

2024-03-07 Thread Chung-Lin Tang
Hi Thomas, Tobias,

On 2023/10/26 6:43 PM, Thomas Schwinge wrote:
> +++ b/gcc/tree.h
> @@ -1813,6 +1813,14 @@ class auto_suppress_location_wrappers
>   #define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \
> (OMP_CLAUSE_SUBCODE_CHECK (NODE, 
> OMP_CLAUSE_MAP)->base.addressable_flag)
>
> +/* Nonzero if OpenACC 'readonly' modifier set, used for 'copyin'.  */
> +#define OMP_CLAUSE_MAP_READONLY(NODE) \
> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
> +
> +/* Same as above, for use in OpenACC cache directives.  */
> +#define OMP_CLAUSE__CACHE__READONLY(NODE) \
> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))
 I'm not sure if these special accessor functions are actually useful, or
 we should just directly use 'TREE_READONLY' instead?  We're only using
 them in contexts where it's clear that the 'OMP_CLAUSE_SUBCODE_CHECK' is
 satisfied, for example.
>>> I find directly using TREE_READONLY confusing.
>>
>> FWIW, I've changed to use TREE_NOTHROW instead, if it can give a better 
>> sense of safety :P
> 
> I don't understand that, why not use 'TREE_READONLY'?
> 
>> I think there's a misunderstanding here anyways: we are not relying on a 
>> DECL marked
>> TREE_READONLY here. We merely need the OMP_CLAUSE_MAP to be marked as 
>> OMP_CLAUSE_MAP_READONLY == 1.
> 
> Yes, I understand that.  My question was why we don't just use
> 'TREE_READONLY (c)', where 'c' is the
> 'OMP_CLAUSE_MAP'/'OMP_CLAUSE__CACHE_' clause (not its decl), and avoid
> the indirection through
> '#define OMP_CLAUSE_MAP_READONLY'/'#define OMP_CLAUSE__CACHE__READONLY',
> given that we're only using them in contexts where it's clear that the
> 'OMP_CLAUSE_SUBCODE_CHECK' is satisfied.  I don't have a strong
> preference, though.

After further re-testing using TREE_NOTHROW, I have reverted to using 
TREE_READONLY, because TREE_NOTHROW clashes
with OMP_CLAUSE_RELEASE_DESCRIPTOR (which doesn't use the OMP_CLAUSE_MAP_* 
naming convention and is
not documented in gcc/tree-core.h either, hmmm...)

I have added the comment adjustments in gcc/tree-core.h for the new uses of 
TREE_READONLY/readonly_flag.

We basically all use OMP_CLAUSE_SUBCODE_CHECK macros for OpenMP clause 
expressions exclusively,
so I don't see a reason to diverge from that style (even when context is clear).

> Either way, you still need to document this:
> 
> | Also, for the new use for OMP clauses, update 'gcc/tree.h:TREE_READONLY',
> | and in 'gcc/tree-core.h' for 'readonly_flag' the
> | "table lists the uses of each of the above flags".

Okay, done as mentioned above.

> In addition to a few individual comments above and below, you've also not
> yet responded to my requests re test cases.

I have greatly expanded the test scan patterns to include 
parallel/kernels/serial/data/enter data,
as well as non-readonly copyin clause together with readonly.

Also added simple 'declare' tests, but there is not anything to scan in the 
'tree-original' dump though.

>> +  tree nl = list;
>> +  bool readonly = false;
>> +  matching_parens parens;
>> +  if (parens.require_open (parser))
>> +{
>> +  /* Turn on readonly modifier parsing for copyin clause.  */
>> +  if (c_kind == PRAGMA_OACC_CLAUSE_COPYIN)
>> + {
>> +   c_token *token = c_parser_peek_token (parser);
>> +   if (token->type == CPP_NAME
>> +   && !strcmp (IDENTIFIER_POINTER (token->value), "readonly")
>> +   && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
>> + {
>> +   c_parser_consume_token (parser);
>> +   c_parser_consume_token (parser);
>> +   readonly = true;
>> + }
>> + }
>> +  location_t loc = c_parser_peek_token (parser)->location;
> 
> I suppose 'loc' here now points to after the opening '(' or after the
> 'readonly :'?  This is different from what 'c_parser_omp_var_list_parens'
> does, and indeed, 'c_parser_omp_variable_list' states that "CLAUSE_LOC is
> the location of the clause", not the location of the variable-list?  As
> this, I suppose, may change diagnostics, please restore the original
> behavior.  (This appears to be different in the C++ front end, huh.)

Thanks for catching this! Fixed.

>> --- a/gcc/fortran/openmp.cc
>> +++ b/gcc/fortran/openmp.cc
>> @@ -1197,7 +1197,7 @@ omp_inv_mask::omp_inv_mask (const omp_mask ) : 
>> omp_mask (m)
>>
>>  static bool
>>  gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
>> -   bool allow_common, bool allow_derived)
>> +   bool allow_common, bool allow_derived, bool readonly 
>> = false)
>>  {
>>gfc_omp_namelist **head = NULL;
>>if (gfc_match_omp_variable_list ("", list, allow_common, NULL, , 
>> true,
>> @@ -1206,7 +1206,10 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, 
>> gfc_omp_map_op map_op,
>>  {
>>gfc_omp_namelist *n;
>>for (n = *head; n; n = n->next)
>> - 

[PATCH, OpenACC 2.7, v2] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2024-03-04 Thread Chung-Lin Tang
gt;> +  else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>>  refcount_ptr = >structelem_refcount;
>>else if (REFCOUNT_STRUCTELEM_P (k->refcount))
>>  refcount_ptr = k->structelem_refcount_ptr;
>> @@ -527,7 +529,9 @@ gomp_decrement_refcount (splay_tree_key k, htab_t 
>> *refcount_set, bool delete_p,
>>
>>uintptr_t *refcount_ptr = >refcount;
>>
>> -  if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>> +  if (k->refcount == REFCOUNT_ACC_MAP_DATA)
>> +refcount_ptr = >dynamic_refcount;
>> +  else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>>  refcount_ptr = >structelem_refcount;
>>else if (REFCOUNT_STRUCTELEM_P (k->refcount))
>>  refcount_ptr = k->structelem_refcount_ptr;
>> @@ -560,6 +564,10 @@ gomp_decrement_refcount (splay_tree_key k, htab_t 
>> *refcount_set, bool delete_p,
>>else if (*refcount_ptr > 0)
>>  *refcount_ptr -= 1;
>>
>> +  /* Force back to 1 if this is an acc_map_data mapping.  */
>> +  if (k->refcount == REFCOUNT_ACC_MAP_DATA && *refcount_ptr == 0)
>> +*refcount_ptr = 1;
>> +
>>   end:
>>if (*refcount_ptr == 0)
>>  {
> 
> It's not clear to me why you need this handling -- instead of just
> handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is,
> early 'return'?
> 
> Per my understanding, this code is for OpenACC only exercised for
> structured data regions, and it seems strange (unnecessary?) to adjust
> the 'dynamic_refcount' for these for 'acc_map_data'-mapped data?  Or am I
> missing anything?

No, that is not true. It goes through almost everything through 
gomp_map_vars_existing/_internal.
This is what happens when you acc_create/acc_copyin on a mapping created by 
acc_map_data.

> Overall, your changes regress the
> commit 3e888f94624294d2b9b34ebfee0916768e5d9c3f
> "Add OpenACC 'acc_map_data' variant to 
> 'libgomp.oacc-c-c++-common/deep-copy-8.c'"
> that I just pushed.  I think you just need to handle
> 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' in
> 'libgomp/oacc-mem.c:goacc_enter_data_internal', 'if (n && struct_p)'?
> Please verify.

Fixed by adding another '&& n->refcount != REFCOUNT_ACC_MAP_DATA' check in 
goacc_enter_data_internal.

> But please also to the "Minimal OpenACC variant corresponding to PR96668"
> code in 'libgomp/oacc-mem.c:goacc_enter_data_internal' add a safeguard
> that we're not running into 'REFCOUNT_ACC_MAP_DATA' there.  I think
> that's currently not (reasonably easily) possible, given that
> 'acc_map_data' isn't available in OpenACC/Fortran, but it'll be available
> later, and then I'd rather have an 'assert' trigger there, instead of
> random behavior.  (I'm not asking you to write a mixed OpenACC/Fortran
> plus C test case for that scenario -- if feasible at all.)

I am not really sure what you want me to do here, but REFCOUNT_ACC_MAP_DATA 
mappings
are all created through a single GOMP_MAP_ALLOC kind. The complex stuff of 
MAP_STRUCT, MAP_TO_PSET, etc.
should all be not related here (I presume even if Fortran eventually gets 
acc_map_data, it would be the
compiler side which should take care of passing the raw data-pointer/array-size 
to the acc_map_data routine)

I have re-tested this on x86_64-linux + nvptx. Please see if this is okay for 
committing to mainline.

Thanks,
Chung-Lin

2024-03-04  Chung-Lin Tang  

libgomp/ChangeLog:

* libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2).
* oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
initialize dynamic_refcount as 1.
(acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA, remove TODO
comments. Add assert of 'n->dynamic_refcount >= 1' and comments.
(goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case.
(goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect
REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
(goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case.
* target.c (gomp_increment_refcount): Add REFCOUNT_ACC_MAP_DATA case.
(gomp_decrement_refcount): Add REFCOUNT_ACC_MAP_DATA case, force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
* testsuite/libgomp.oacc-c-c++-common/lib-96.c: New testcase.
* testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust
testcase error output scan test.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index f98cccd8b66..089393846d1 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1163,6 +1163,8 @@ struct target_mem_desc;
 /* Special value for refcount - tgt_offset contains target address of t

[PATCH, OpenACC 2.7] struct/array reductions for Fortran

2024-02-08 Thread Chung-Lin Tang
Hi Tobias, Thomas,
this patch adds support for Fortran to use arrays and struct(record) types in 
OpenACC reductions.

There is still some shortcomings in the current state, mainly that only 
explicit-shaped arrays can be used (like its C counterpart). Anything else is 
currently a bit more complicated in the middle-end, since the existing 
reduction code creates an "init-op" (literal of initial values) which can't be 
done when say TYPE_MAX_VALUE (TYPE_DOMAIN (array_type)) is not a tree constant. 
I think we'll be on the hook to solve this later, but I think the current state 
is okay to submit.

Tested without regressions on mainline (on top of first struct/array reduction 
patch[1])

Thanks,
Chung-Lin

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

2024-02-08  Chung-Lin Tang  

gcc/fortran/ChangeLog:
* openmp.cc (oacc_reduction_defined_type_p): New function.
(resolve_omp_clauses): Adjust OpenACC array reduction error case. Use
oacc_reduction_defined_type_p for OpenACC.
* trans-openmp.cc (gfc_trans_omp_array_reduction_or_udr):
Add 'bool openacc' parameter, adjust part of function to be !openacc
only.
(gfc_trans_omp_reduction_list): Add 'bool openacc' parameter, pass to
calls to gfc_trans_omp_array_reduction_or_udr.
(gfc_trans_omp_clauses): Add 'openacc' argument to calls to
gfc_trans_omp_reduction_list.
(gfc_trans_omp_do): Pass 'op == EXEC_OACC_LOOP' as 'bool openacc'
parameter in call to gfc_trans_omp_clauses.

gcc/ChangeLog:
* omp-low.cc (omp_reduction_init_op): Add checking if reduced array
has constant bounds.
(lower_oacc_reductions): Add handling of error_mark_node.

gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/array-reduction.f90: Adjust testcase.
* gfortran.dg/goacc/reduction.f95: Likewise.

libgomp/ChangeLog:
* libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90: New testcase.
* libgomp/testsuite/libgomp.oacc-fortran/reduction-10.f90: Likewise.
* libgomp/testsuite/libgomp.oacc-fortran/reduction-11.f90: Likewise.
* libgomp/testsuite/libgomp.oacc-fortran/reduction-12.f90: Likewise.
* libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90: Likewise.
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0af80d54fad..4bba9e666d6 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -7047,6 +7047,72 @@ oacc_is_loop (gfc_code *code)
 || code->op == EXEC_OACC_LOOP;
 }
 
+static bool
+oacc_reduction_defined_type_p (enum gfc_omp_reduction_op rop, gfc_typespec *ts)
+{
+  if (rop == OMP_REDUCTION_USER || rop == OMP_REDUCTION_NONE)
+return false;
+
+  if (ts->type == BT_INTEGER)
+switch (rop)
+  {
+  case OMP_REDUCTION_AND:
+  case OMP_REDUCTION_OR:
+  case OMP_REDUCTION_EQV:
+  case OMP_REDUCTION_NEQV:
+   return false;
+  default:
+   return true;
+  }
+
+  if (ts->type == BT_LOGICAL)
+switch (rop)
+  {
+  case OMP_REDUCTION_AND:
+  case OMP_REDUCTION_OR:
+  case OMP_REDUCTION_EQV:
+  case OMP_REDUCTION_NEQV:
+   return true;
+  default:
+   return false;
+  }
+
+  if (ts->type == BT_REAL || ts->type == BT_COMPLEX)
+switch (rop)
+  {
+  case OMP_REDUCTION_PLUS:
+  case OMP_REDUCTION_TIMES:
+  case OMP_REDUCTION_MINUS:
+   return true;
+
+  case OMP_REDUCTION_AND:
+  case OMP_REDUCTION_OR:
+  case OMP_REDUCTION_EQV:
+  case OMP_REDUCTION_NEQV:
+   return false;
+
+  case OMP_REDUCTION_MAX:
+  case OMP_REDUCTION_MIN:
+   return ts->type != BT_COMPLEX;
+  case OMP_REDUCTION_IAND:
+  case OMP_REDUCTION_IOR:
+  case OMP_REDUCTION_IEOR:
+   return false;
+  default:
+   gcc_unreachable ();
+  }
+
+  if (ts->type == BT_DERIVED)
+{
+  for (gfc_component *p = ts->u.derived->components; p; p = p->next)
+   if (!oacc_reduction_defined_type_p (rop, >ts))
+ return false;
+  return true;
+}
+
+  return false;
+}
+
 static void
 resolve_scalar_int_expr (gfc_expr *expr, const char *clause)
 {
@@ -8137,13 +8203,15 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
  else
n->sym->mark = 1;
 
- /* OpenACC does not support reductions on arrays.  */
- if (n->sym->as)
+ /* OpenACC current only supports array reductions on explicit-shape
+arrays.  */
+ if ((n->sym->as && n->sym->as->type != AS_EXPLICIT)
+ || n->sym->attr.codimension)
gfc_error ("Array %qs is not permitted in reduction at %L",
   n->sym->name, >where);
}
 }
-  
+
   for (n = omp_clauses->lists[OMP_LIST_TO]; n; n = n->next)
 n->sym->mark = 0;
   for (n = omp_clauses->lists[OMP_

[committed] MAINTAINERS: Update my email address

2024-01-25 Thread Chung-Lin Tang
Updated my email address.

Thanks,
Chung-Lin

From ffeab69e1ffc0405da3a9222c7b9f7a000252702 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 25 Jan 2024 18:20:43 +
Subject: [PATCH] MAINTAINERS: Update my work email address

* MAINTAINERS: Update my work email address.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7d3b78d276e..8b11ddbc069 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -99,7 +99,7 @@ moxie portAnthony Green   

 msp430 portNick Clifton
 nds32 port Chung-Ju Wu 
 nds32 port Shiva Chen  
-nios2 port Chung-Lin Tang  
+nios2 port Chung-Lin Tang  
 nios2 port Sandra Loosemore
 nvptx port Tom de Vries
 nvptx port Thomas Schwinge 
-- 
2.34.1



[PATCH, OpenACC 2.7] Implement reductions for arrays and structs

2024-01-02 Thread Chung-Lin Tang
Hi Thomas, Andrew,
this patch implements reductions for arrays and structs for OpenACC. Following 
the pattern for OpenACC reductions, this is mostly in the respective NVPTX/GCN 
backends' *_goacc_reduction_setup/init/fini/teardown hooks, particularly in the 
fini part, and [nvptx/gcn]_reduction_update routines. The code is mostly 
similar between the two targets, with mostly the lack of vector mode handling 
in GCN.

To Julian, there is a patch to the middle-end neutering, a hack actually, that 
detects SSA_NAMEs used in reduction array MEM_REFs, and avoids single->parallel 
copying (by moving those definitions before BUILT_IN_GOACC_SINGLE_COPY_START). 
This appears to work because reductions do their own initializing of the 
private copy.

As we discussed in our internal calls, the real proper way is to create the 
private array in a more appropriate stage, but that is too long a shot for now. 
The changes here are needed at least for some -O0 cases (when under 
optimization, propagation of the private copies' local address eliminate the 
SSA_NAME and things actually just work in that case). So please bear with this 
hack.

I believe the new added libgomp testcases should be fairly complete. Though 
note that one case of reduction of * for double arrays has been commented out 
for now, for there appears to be a (presumably) unrelated issue causing this 
case to fail (maybe has to do with the loop-based atomic form used by both 
NVPTX/GCN). Maybe should XFAIL instead of comment out. Will do this in next 
iteration.

Thanks,
Chung-Lin

2024-01-02  Chung-Lin Tang  

gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* c-typeck.cc (c_oacc_reduction_defined_type_p): New function.
(c_oacc_reduction_code_name): Likewise.
(c_finish_omp_clauses): Handle OpenACC cases using new functions.

gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* semantics.cc (cp_oacc_reduction_defined_type_p): New function.
(cp_oacc_reduction_code_name): Likewise.
(finish_omp_reduction_clause): Handle OpenACC cases using new functions.

gcc/ChangeLog:
* config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for
handling ARRAY_TYPE and RECORD_TYPE reductions.
(gcn_goacc_reduction_setup): Likewise.
(gcn_goacc_reduction_init): Likewise.
(gcn_goacc_reduction_fini): Likewise.
(gcn_goacc_reduction_teardown): Likewise.

* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate
V2SI shuffle using vec_extract op.
(nvptx_get_shared_red_addr): Adjust type/alignment calculations to
use TYPE_SIZE/ALIGN_UNIT instead of machine mode based.
(nvptx_reduction_update): Additions for handling ARRAY_TYPE and
RECORD_TYPE reductions.
(nvptx_goacc_reduction_setup): Likewise.
(nvptx_goacc_reduction_init): Likewise.
(nvptx_goacc_reduction_fini): Likewise.
(nvptx_goacc_reduction_teardown): Likewise.

* omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type
building to use decl type, rather than generic ptr_type_node.
(omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op
construction.
(lower_oacc_reductions): Add code to teardown/recover array access
MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements.
Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT
instead of machine mode based.

* omp-oacc-neuter-broadcast.cc (worker_single_copy):
Add 'hash_set *array_reduction_base_vars' parameter.
Add xxx.

(neuter_worker_single): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust recursive calls to self and worker_single_copy.
(oacc_do_neutering): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust call to neuter_worker_single.
(execute_omp_oacc_neuter_broadcast): Add local
'hash_set array_reduction_base_vars' declaration. Collect MEM_REF
base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add
'_reduction_base_vars' argument to call of oacc_do_neutering.

* omp-offload.cc (default_goacc_reduction): Add unshare_expr.

gcc/testsuite/ChangeLog:
* c-c++-common/goacc/reduction-9.c: New test.
* c-c++-common/goacc/reduction-10.c: New test.
* c-c++-common/goacc/reduction-11.c: New test.
* c-c++-common/goacc/reduction-12.c: New test.
* c-c++-common/goacc/reduction-13.c: New test.

libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/reduction.h
(check_reduction_array_xx): New macro.
(operator_apply): Likewise.
(check_reduction_array_op): Likewise.
(check_reduction_arraysec_op): Likewise.
(function_ap

[PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends

2023-08-07 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas, Tobias,
here's the updated v2 of the readonly modifier front-end patch.

On 2023/7/20 11:08 PM, Tobias Burnus wrote:
>>> +++ b/gcc/c/c-parser.cc
>>> @@ -14059,7 +14059,8 @@ c_parser_omp_variable_list (c_parser *parser,
>>>
>>>   static tree
>>>   c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code kind,
>>> -   tree list, bool allow_deref = false)
>>> +   tree list, bool allow_deref = false,
>>> +   bool *readonly = NULL)
>>> ...
>> Instead of doing this in 'c_parser_omp_var_list_parens', I think it's
>> clearer to have this special 'readonly :' parsing logic in the two places
>> where it's used.
> I concur. The same issue also occurred for OpenMP's
> c_parser_omp_clause_to, and c_parser_omp_clause_from and the 'present'
> modifier. For it, I created a combined function but the main reason for
> that is that OpenMP also permits more modifiers (like 'iterators'),
> which would cause more duplication of code ('iterator' is not yet
> supported).
> 
> For something as simple to parse as this modifier, I would just do it at
> the two places – as Thomas suggested.

Okay, I've changed the C/C++ parser parts to have the parsing logic directly
added.

>>> +++ b/gcc/fortran/gfortran.h
>>> @@ -1360,7 +1360,11 @@ typedef struct gfc_omp_namelist
>>>   {
>>> gfc_omp_reduction_op reduction_op;
>>> gfc_omp_depend_doacross_op depend_doacross_op;
>>> -  gfc_omp_map_op map_op;
>>> +  struct
>>> +{
>>> +   ENUM_BITFIELD (gfc_omp_map_op) map_op:8;
>>> +   bool readonly;
>>> +};
>>> gfc_expr *align;
>>> struct
>>>{
>> [...] Thus, the above looks good to me.
> I concur but I wonder whether it would be cleaner to name the struct;
> this makes it also more obvious what belongs together in the union.
> 
> Namely, naming the struct 'map' and then changing the 45 users from
> 'u.map_op' to 'u.map.op' and the new 'u.readonly' to 'u.map.readonly'. –
> this seems to be cleaner.

I've adjusted 'u.map' to be a named struct now, and updated the references.

>> + if (gfc_match ("readonly :") == MATCH_YES)
>> I note this one does not have a space after ':' in 'gfc_match', but the
>> one above in 'gfc_match_omp_clauses' does.  I don't know off-hand if that
>> makes a difference in parsing -- probably not, as all of
>> 'gcc/fortran/openmp.cc' generally doesn't seem to be very consistent
>> about these two variants?
> It *does* make a difference. And for obvious reasons. You don't want to 
> permit:
> 
>!$acc kernels asnyccopy(a)
> 
> but require at least one space (or comma) between "async" and "copy"..
> (In fixed form Fortran, it would be fine - as would be "!$acc k e nelsasy nc 
> co p y(a)".)
> 
> A " " matches zero or more whitespaces, but with gfc_match_space you can find 
> out
> whether there was whitespace or not.

Okay, made sure both are 'gfc_match ("readonly : ")'. Thanks for catching that, 
didn't
realize that space was significant.

>>> +++ b/gcc/tree.h
>>> @@ -1813,6 +1813,14 @@ class auto_suppress_location_wrappers
>>>   #define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \
>>> (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.addressable_flag)
>>>
>>> +/* Nonzero if OpenACC 'readonly' modifier set, used for 'copyin'.  */
>>> +#define OMP_CLAUSE_MAP_READONLY(NODE) \
>>> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
>>> +
>>> +/* Same as above, for use in OpenACC cache directives.  */
>>> +#define OMP_CLAUSE__CACHE__READONLY(NODE) \
>>> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))
>> I'm not sure if these special accessor functions are actually useful, or
>> we should just directly use 'TREE_READONLY' instead?  We're only using
>> them in contexts where it's clear that the 'OMP_CLAUSE_SUBCODE_CHECK' is
>> satisfied, for example.
> I find directly using TREE_READONLY confusing.

FWIW, I've changed to use TREE_NOTHROW instead, if it can give a better sense 
of safety :P

I think there's a misunderstanding here anyways: we are not relying on a DECL 
marked
TREE_READONLY here. We merely need the OMP_CLAUSE_MAP to be marked as 
OMP_CLAUSE_MAP_READONLY == 1.

The other points-to patch then (also in front-ends) take the 
OMP_CLAUSE_MAP_READONLY
to mark the clauses of "base-pointers of array-sections" as 
OMP_CLAUSE_MAP_POINTS_TO_READONLY,
and later this gra

[PATCH, OpenACC 2.7, v2] Implement default clause support for data constructs

2023-08-01 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,
this is v2 of the patch for implementing the OpenACC 2.7 addition of
default(none|present) support for data constructs.

Instead of propagating an additional 'oacc_default_kind' for OpenACC,
this patch does it in a more complete way: it directly propagates the
gimplify_omp_ctx* pointer of the inner most context where we found
a default-clause. This supports displaying the location/type of OpenACC
construct where the default-clause is in the error messages.

The testcases also have the multiple nested data construct testing added,
where we can now have messages referring precisely to the exact innermost
default clause that was active at that program point.

Note, I got rid of the dummy OMP_CLAUSE_DEFAULT creation in this version,
since it seemed not really needed.

Re-tested on master on powerpc64le-linux/nvptx. Okay to commit?

Thanks,
Chung-Lin

2023-08-01  Chung-Lin Tang  

gcc/c/ChangeLog:
* c-parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/cp/ChangeLog:
* parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/fortran/ChangeLog:
* openmp.cc (OACC_DATA_CLAUSES): Add OMP_CLAUSE_DEFAULT.

gcc/ChangeLog:
* gimplify.cc (struct gimplify_omp_ctx): Add oacc_default_clause_ctx
field.
(new_omp_context): Initialize oacc_default_clause_ctx field.
(oacc_region_type_name): New function.
(oacc_default_clause): Lookup current default_kind value from
ctx->oacc_default_clause_ctx, adjust default(none) error and inform
message dumping.
(gimplify_scan_omp_clauses): Upon OMP_CLAUSE_DEFAULT case, set
ctx->oacc_default_clause_ctx to current context.

gcc/testsuite/ChangeLog:
* c-c++-common/goacc/default-3.c: Adjust testcase.
* c-c++-common/goacc/default-5.c: Adjust testcase.
* gfortran.dg/goacc/default-3.f95: Adjust testcase.
* gfortran.dg/goacc/default-5.f: Adjust testcase.diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 24a6eb6e459..974f0132787 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -18196,6 +18196,7 @@ c_parser_oacc_cache (location_t loc, c_parser *parser)
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)  \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)   \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NO_CREATE)   \
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d7ef5b34d42..bc59fbeac20 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -45860,6 +45860,7 @@ cp_parser_oacc_cache (cp_parser *parser, cp_token 
*pragma_tok)
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)  \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DETACH)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)   \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)  \
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 2952cd300ac..c37f843ec3b 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -3802,7 +3802,8 @@ error:
 #define OACC_DATA_CLAUSES \
   (omp_mask (OMP_CLAUSE_IF) | OMP_CLAUSE_DEVICEPTR  | OMP_CLAUSE_COPY\
| OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT | OMP_CLAUSE_CREATE
  \
-   | OMP_CLAUSE_NO_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_ATTACH)
+   | OMP_CLAUSE_NO_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_ATTACH   \
+   | OMP_CLAUSE_DEFAULT)
 #define OACC_LOOP_CLAUSES \
   (omp_mask (OMP_CLAUSE_COLLAPSE) | OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER
  \
| OMP_CLAUSE_VECTOR | OMP_CLAUSE_SEQ | OMP_CLAUSE_INDEPENDENT \
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 320920ed74c..ec0ccc67da8 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -225,6 +225,7 @@ struct gimplify_omp_ctx
   vec loop_iter_var;
   location_t location;
   enum omp_clause_default_kind default_kind;
+  struct gimplify_omp_ctx *oacc_default_clause_ctx;
   enum omp_region_type region_type;
   enum tree_code code;
   bool combined_loop;
@@ -459,6 +460,10 @@ new_omp_context (enum omp_region_type region_type)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
 c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
+  if (gimplify_omp_ctxp)
+c->oacc_default_clause_ctx = gimplify_omp_ctxp-

[PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2023-07-25 Thread Chung-Lin Tang via Gcc-patches
On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote:
> As we discussed earlier, the work for actually linking this to middle-end
> points-to analysis is a somewhat non-trivial issue. This first patch allows
> the language feature to be used in OpenACC directives first (with no effect 
> for now).
> The middle-end changes are probably going to be a later patch.

This second patch tries to link the readonly modifier to points-to analysis.

There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in the
alias oracle routines in tree-ssa-alias.cc, so basically what this patch does is
try to make the variables holding the array section base pointers to have this
flag set.

There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on the
associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set.
Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the tmp
vars carrying these receiver references on the offloaded side. These
eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY.

This still doesn't always work as expected in terms of optimization:
struct pointer fields and Fortran arrays (kind of like C structs) which have
several accesses to create the pointer access on the receive/offloaded side,
and SRA appears to not work on these sequences, so gets in the way of much
redundancy elimination.

Currently have one testcase where we can demonstrate 'readonly' can avoid
a clobber by function call. Tested on powerpc64le-linux/nvptx.

Note this patch is create a-top of the front-end patch.
(will respond to the other front-end patch comments later)

Thanks,
Chung-Lin

2023-07-25  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-typeck.cc (handle_omp_array_sections):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/cp/ChangeLog:

* semantics.cc (handle_omp_array_sections):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_trans_omp_array_section):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/ChangeLog:

* gimple-expr.cc (copy_var_decl): Copy DECL_POINTS_TO_READONLY
for VAR_DECLs.
* gimplify.cc (struct gimplify_omp_ctx):
Add 'hash_set *pt_readonly_ptrs' field.
(internal_get_tmp_var): Set
DECL_POINTS_TO_READONLY/SSA_NAME_POINTS_TO_READONLY_MEMORY for
new temp vars.
(build_omp_struct_comp_nodes):
Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
(gimplify_scan_omp_clauses): Collect OMP_CLAUSE_MAP_POINTS_TO_READONLY
to ctx->pt_readonly_ptrs.
* omp-low.cc (lower_omp_target): Set DECL_POINTS_TO_READONLY for
variables of receiver refs.
* tree-pretty-print.cc (dump_omp_clause):
Print OMP_CLAUSE_MAP_POINTS_TO_READONLY.
(dump_generic_node): Print SSA_NAME_POINTS_TO_READONLY_MEMORY.
* tree.h (DECL_POINTS_TO_READONLY): New macro.
(OMP_CLAUSE_MAP_POINTS_TO_READONLY): New macro.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/readonly-1.c: Adjust testcase.
* c-c++-common/goacc/readonly-2.c: New testcase.
* gfortran.dg/goacc/readonly-1.f90: Adjust testcase.
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7cf411155c6..42591e4029a 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14258,6 +14258,8 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
   else
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
+  if (OMP_CLAUSE_MAP_READONLY (c))
+   OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
   OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
   if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
  && !c_mark_addressable (t))
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..6ab467e1140 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5872,6 +5872,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
}
  else
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
+ if (OMP_CLAUSE_MAP_READONLY (c))
+   OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
  OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
  if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
  && !cxx_mark_addressable (t))
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 2253d559f9c..d7cd65af1bb 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2524,6 +2524,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, 
gfc_exec_op op,
   node3 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
   OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind);
   OMP_CLAUSE_DECL (node3) = gfc_conv_descriptor_data_ge

Re: [PATCH, OpenACC 2.7] Implement default clause support for data constructs

2023-07-14 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,

On 2023/6/23 6:47 PM, Thomas Schwinge wrote:
>> +
>>ctx->clauses = *orig_list_p;
>>gimplify_omp_ctxp = ctx;
>>  }
> Instead of this, in 'gimplify_omp_workshare', before the
> 'gimplify_scan_omp_clauses' call, do something like:
> 
> if ((ort & ORT_ACC)
> && !omp_find_clause (OMP_CLAUSES (expr), OMP_CLAUSE_DEFAULT))
>   {
> /* Determine effective 'default' clause for OpenACC compute 
> construct.  */
> for (struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp; ctx; ctx = 
> ctx->outer_context)
>   {
> if (ctx->region_type == ORT_ACC_DATA
> && ctx->default_kind != OMP_CLAUSE_DEFAULT_SHARED)
>   {
> [Append actual default clause on compute construct.]
> break;
>   }
>   }
>   }
> 
> That seems conceptually simpler to me?

I'm not sure if this is conceptually simpler, but using 'oacc_default_kind'
is definitely faster computationally :)

However, as you mention below...

> For the 'build_omp_clause', does using 'ctx->location' instead of
> 'UNKNOWN_LOCATION' help diagnostics in any way?  Like if we add in
> 'gcc/gimplify.cc:oacc_default_clause',
> 'if (ctx->default_kind == OMP_CLAUSE_DEFAULT_NONE)' another 'inform' to
> point to the 'data' construct's 'default' clause?  (But not sure if
> that's easily done; otherwise don't.)

Noticed that we will need to track the actually lexically enclosing OpenACC 
construct
with the user set default-clause somewhere in 'ctx', in order to satisfy the 
current
diagnostics in oacc_default_clause().

(the UNKNOWN_LOCATION for the internally created default-clause probably doesn't
matter, that one is just for reminder in internal dumps, probably never plays 
role
in user diagnostics)

> Similar to the ones you've already got, please also add a few test cases
> for nested 'default' clauses, like:
> 
> #pragma acc data // no vs. 'default(none)' vs. 'default(present)'
> {
>   #pragma acc data // no vs. same vs. different 'default' clause
>   {
> #pragma acc data // no vs. same vs. different 'default' clause
> {
>   #pragma acc parallel
> 
> Similarly, test cases where 'default' on the compute construct overrides
> 'default' of an outer 'data' construct.

Okay, will add more testcases.

Thanks,
Chung-Lin


[PATCH, OpenACC 2.7, v2] Implement host_data must have use_device clause requirement

2023-07-13 Thread Chung-Lin Tang via Gcc-patches
On 2023/6/16 5:13 PM, Thomas Schwinge wrote:
> OK with one small change, please -- unless there's a reason for doing it
> this way:
> 
>> --- a/gcc/fortran/trans-openmp.cc
>> +++ b/gcc/fortran/trans-openmp.cc
>> @@ -4677,6 +4677,12 @@ gfc_trans_oacc_construct (gfc_code *code)
>>   break;
>>case EXEC_OACC_HOST_DATA:
>>   construct_code = OACC_HOST_DATA;
>> + if (code->ext.omp_clauses->lists[OMP_LIST_USE_DEVICE] == NULL)
>> +   {
>> + error_at (gfc_get_location (>loc),
>> +   "% construct requires % 
>> clause");
>> + return NULL_TREE;
>> +   }
>>   break;
>>default:
>>   gcc_unreachable ();
> The OpenMP "must contain at least one [...] clause" checks are done in
> 'gcc/fortran/openmp.cc:resolve_omp_clauses'.  For consistency (or, to let
> 'gcc/fortran/trans-openmp.cc' continue to just deal with "directive
> translation"), do similar for OpenACC 'host_data'?  (..., and we later
> accordingly adjust 'gcc/fortran/openmp.cc:gfc_match_oacc_update', too?)

Hi Thomas,
I've adjusted the Fortran implementation as you described. Yes, I agree this way
more fits current Fortran FE conventions.

I've re-tested the attached v2 patch, will commit later this week if no major
objections.

Thanks,
Chung-Lin

gcc/c/ChangeLog:

* c-parser.cc (c_parser_oacc_host_data): Add checking requiring OpenACC
host_data construct to have an use_device clause.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_host_data): Add checking requiring OpenACC
host_data construct to have an use_device clause.

gcc/fortran/ChangeLog:

* openmp.cc (resolve_omp_clauses): Add checking requiring
OpenACC host_data construct to have an use_device clause.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/host_data-2.c: Adjust testcase.
* gfortran.dg/goacc/host_data-error.f90: New testcase.
* gfortran.dg/goacc/pr71704.f90: Adjust testcase.diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 24a6eb6e459..80920b31f83 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -18461,8 +18461,13 @@ c_parser_oacc_host_data (location_t loc, c_parser 
*parser, bool *if_p)
   tree stmt, clauses, block;
 
   clauses = c_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK,
-  "#pragma acc host_data");
-
+  "#pragma acc host_data", false);
+  if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR))
+{
+  error_at (loc, "% construct requires % clause");
+  return error_mark_node;
+}
+  clauses = c_finish_omp_clauses (clauses, C_ORT_ACC);
   block = c_begin_omp_parallel ();
   add_stmt (c_parser_omp_structured_block (parser, if_p));
   stmt = c_finish_oacc_host_data (loc, clauses, block);
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5e2b5cba57e..beb5b632e5e 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -45895,8 +45895,15 @@ cp_parser_oacc_host_data (cp_parser *parser, cp_token 
*pragma_tok, bool *if_p)
   unsigned int save;
 
   clauses = cp_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK,
-   "#pragma acc host_data", pragma_tok);
-
+   "#pragma acc host_data", pragma_tok,
+   false);
+  if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR))
+{
+  error_at (pragma_tok->location,
+   "% construct requires % clause");
+  return error_mark_node;
+}
+  clauses = finish_omp_clauses (clauses, C_ORT_ACC);
   block = begin_omp_parallel ();
   save = cp_parser_begin_omp_structured_block (parser);
   cp_parser_statement (parser, NULL_TREE, false, if_p);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 8efc4b3ecfa..f7af02845de 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -8764,6 +8764,12 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
   "% clause", _clauses->detach->where);
 }
 
+  if (openacc
+  && code->op == EXEC_OACC_HOST_DATA
+  && omp_clauses->lists[OMP_LIST_USE_DEVICE] == NULL)
+gfc_error ("% construct at %L requires % clause",
+  >loc);
+
   if (omp_clauses->assume)
 gfc_resolve_omp_assumptions (omp_clauses->assume);
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/host_data-2.c 
b/gcc/testsuite/c-c++-common/goacc/host_data-2.c
index b3093e575ff..862a764eb3a 100644
--- a/gcc/testsuite/c-c++-common/goacc/host_data-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/host_data-2.c
@@ -8,7 +8,9 @@ void
 f (void)
 {
   int v2 = 3;
-#pragma acc host_data copy(v2) /* { dg-error ".copy. is not valid for ..pragma 
acc host_data." } */
+#pragma acc host_data copy(v2)
+  /* { dg-error ".copy. is not valid for ..pragma acc host_data." "" { target 
*-*-* } .-1 } */
+  /* { dg-error ".host_data. construct requires .use_device. clause" "" { 
target *-*-* } .-2 } */
   ;
 
 

[PATCH, OpenACC 2.7] readonly modifier support in front-ends

2023-07-10 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,
this patch contains support for the 'readonly' modifier in copyin clauses
and the cache directive.

As we discussed earlier, the work for actually linking this to middle-end
points-to analysis is a somewhat non-trivial issue. This first patch allows
the language feature to be used in OpenACC directives first (with no effect for 
now).
The middle-end changes are probably going to be a later patch.

(Also CCing Tobias because of the Fortran bits)

Tested on powerpc64le-linux with nvptx offloading. Is this okay for trunk?

Thanks,
Chung-Lin

2023-07-10  Chung-Lin Tang  

gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_var_list_parens):
Add 'bool *readonly = NULL' parameter, add readonly modifier parsing
support.
(c_parser_oacc_data_clause): Adjust c_parser_omp_var_list_parens call
to turn on readonly modifier parsing for copyin clause, set
OMP_CLAUSE_MAP_READONLY if readonly modifier found, update comments.
(c_parser_oacc_cache): Adjust c_parser_omp_var_list_parens call
to turn on readonly modifier parsing, set OMP_CLAUSE__CACHE__READONLY
if readonly modifier found, update comments.

gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_var_list):
Add 'bool *readonly = NULL' parameter, add readonly modifier parsing
support.
(cp_parser_oacc_data_clause): Adjust cp_parser_omp_var_list call
to turn on readonly modifier parsing for copyin clause, set
OMP_CLAUSE_MAP_READONLY if readonly modifier found, update comments.
(cp_parser_oacc_cache): Adjust cp_parser_omp_var_list call
to turn on readonly modifier parsing, set OMP_CLAUSE__CACHE__READONLY
if readonly modifier found, update comments.

gcc/fortran/ChangeLog:
* gfortran.h (typedef struct gfc_omp_namelist): Adjust map_op as
ENUM_BITFIELD field, add 'bool readonly' field.
* openmp.cc (gfc_match_omp_map_clause): Add 'bool readonly = false'
parameter, set n->u.readonly field.
(gfc_match_omp_clauses): Add readonly modifier parsing for OpenACC
copyin clause, adjust call to gfc_match_omp_map_clause.
(gfc_match_oacc_cache): Add readonly modifier parsing for OpenACC
cache directive, adjust call to gfc_match_omp_map_clause.
* trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_CLAUSE_MAP_READONLY,
OMP_CLAUSE__CACHE__READONLY to 1 when readonly is set.

gcc/ChangeLog:
* tree-pretty-print.cc (dump_omp_clause): Add support for printing
OMP_CLAUSE_MAP_READONLY and OMP_CLAUSE__CACHE__READONLY.
* tree.h (OMP_CLAUSE_MAP_READONLY): New macro.
(OMP_CLAUSE__CACHE__READONLY): New macro.

gcc/testsuite/ChangeLog:
* c-c++-common/goacc/readonly-1.c: New test.
* gfortran.dg/goacc/readonly-1.f90: New test.

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index d4b98d5d8b6..09e1e89d793 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -14059,7 +14059,8 @@ c_parser_omp_variable_list (c_parser *parser,
 
 static tree
 c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code kind,
- tree list, bool allow_deref = false)
+ tree list, bool allow_deref = false,
+ bool *readonly = NULL)
 {
   /* The clauses location.  */
   location_t loc = c_parser_peek_token (parser)->location;
@@ -14067,6 +14068,20 @@ c_parser_omp_var_list_parens (c_parser *parser, enum 
omp_clause_code kind,
   matching_parens parens;
   if (parens.require_open (parser))
 {
+  if (readonly != NULL)
+   {
+ c_token *token = c_parser_peek_token (parser);
+ if (token->type == CPP_NAME
+ && !strcmp (IDENTIFIER_POINTER (token->value), "readonly")
+ && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
+   {
+ c_parser_consume_token (parser);
+ c_parser_consume_token (parser);
+ *readonly = true;
+   }
+ else
+   *readonly = false;
+   }
   list = c_parser_omp_variable_list (parser, loc, kind, list, allow_deref);
   parens.skip_until_found_close (parser);
 }
@@ -14084,7 +14099,11 @@ c_parser_omp_var_list_parens (c_parser *parser, enum 
omp_clause_code kind,
OpenACC 2.6:
no_create ( variable-list )
attach ( variable-list )
-   detach ( variable-list ) */
+   detach ( variable-list )
+
+   OpenACC 2.7:
+   copyin (readonly : variable-list )
+ */
 
 static tree
 c_parser_oacc_data_clause (c_parser *parser, pragma_omp_clause c_kind,
@@ -14135,11 +14154,22 @@ c_parser_oacc_data_clause (c_parser *parser, 
pragma_omp_clause c_kind,
 default:
   gcc_unreachable ();
 }
+
+  /* Turn on readonly modifier parsing for copyin clause.  */
+  bool readonly = false, *readonly_ptr = NULL;
+  if (c_kind == PRAGMA_OACC_CLAUSE_COPYIN)
+readonly_ptr = 
+
   tree nl, c;
-  

[PATCH, OpenACC 2.7] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2023-06-22 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,
This patch adjusts the implementation of acc_map_data/acc_unmap_data API library
routines to more fit the description in the OpenACC 2.7 specification.

Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA
special value to mark acc_map_data-created mappings, and allow adjustment of
dynamic_refcount of such mappings by other constructs. Enforcing of an initial
value of 1 for such mappings, and only allowing acc_unmap_data to delete such
mappings, is implemented as specified.

Actually, there is no real change (or improvement) in behavior of the API (thus
no new tests) I've looked at the related OpenACC spec issues, and it seems that
this part of the 2.7 spec change is mostly a clarification (see no downside in
current REFCOUNT_INFINITY based implementation either).
But this patch does make the internals more close to the spec description.

Tested without regressions using powerpc64le-linux/nvptx, okay for trunk?

Thanks,
Chung-Lin

2023-06-22  Chung-Lin Tang  

libgomp/ChangeLog:

* libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2).
* oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
initialize dynamic_refcount as 1.
(acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
(goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case.
(goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect
REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
* target.c (gomp_increment_refcount): Add REFCOUNT_ACC_MAP_DATA case.
(gomp_decrement_refcount): Add REFCOUNT_ACC_MAP_DATA case, force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
* testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust
testcase error output scan test.
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 4d2bfab4b71..fb8ef651dfb 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1166,6 +1166,8 @@ struct target_mem_desc;
 /* Special value for refcount - tgt_offset contains target address of the
artificial pointer to "omp declare target link" object.  */
 #define REFCOUNT_LINK (REFCOUNT_SPECIAL | 1)
+/* Special value for refcount - created through acc_map_data.  */
+#define REFCOUNT_ACC_MAP_DATA (REFCOUNT_SPECIAL | 2)
 
 /* Special value for refcount - structure element sibling list items.
All such key refounts have REFCOUNT_STRUCTELEM bits set, with _FLAG_FIRST
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index fe632740769..2a782ac22c1 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -411,7 +411,8 @@ acc_map_data (void *h, void *d, size_t s)
   assert (n->refcount == 1);
   assert (n->dynamic_refcount == 0);
   /* Special reference counting behavior.  */
-  n->refcount = REFCOUNT_INFINITY;
+  n->refcount = REFCOUNT_ACC_MAP_DATA;
+  n->dynamic_refcount = 1;
 
   if (profiling_p)
{
@@ -460,7 +461,7 @@ acc_unmap_data (void *h)
  the different 'REFCOUNT_INFINITY' cases, or simply separate
  'REFCOUNT_INFINITY' values per different usage ('REFCOUNT_ACC_MAP_DATA'
  etc.)?  */
-  else if (n->refcount != REFCOUNT_INFINITY)
+  else if (n->refcount != REFCOUNT_ACC_MAP_DATA)
 {
   gomp_mutex_unlock (_dev->lock);
   gomp_fatal ("refusing to unmap block [%p,+%d] that has not been mapped"
@@ -519,7 +520,8 @@ goacc_map_var_existing (struct gomp_device_descr *acc_dev, 
void *hostaddr,
 }
 
   assert (n->refcount != REFCOUNT_LINK);
-  if (n->refcount != REFCOUNT_INFINITY)
+  if (n->refcount != REFCOUNT_INFINITY
+  && n->refcount != REFCOUNT_ACC_MAP_DATA)
 n->refcount++;
   n->dynamic_refcount++;
 
@@ -683,6 +685,7 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void 
*h, size_t s,
 
   assert (n->refcount != REFCOUNT_LINK);
   if (n->refcount != REFCOUNT_INFINITY
+  && n->refcount != REFCOUNT_ACC_MAP_DATA
   && n->refcount < n->dynamic_refcount)
 {
   gomp_mutex_unlock (_dev->lock);
@@ -691,15 +694,27 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, 
void *h, size_t s,
 
   if (finalize)
 {
-  if (n->refcount != REFCOUNT_INFINITY)
+  if (n->refcount != REFCOUNT_INFINITY
+ && n->refcount != REFCOUNT_ACC_MAP_DATA)
n->refcount -= n->dynamic_refcount;
-  n->dynamic_refcount = 0;
+
+  if (n->refcount == REFCOUNT_ACC_MAP_DATA)
+   /* Mappings created by acc_map_data are returned to initial
+  dynamic_refcount of 1. Can only be deleted by acc_unmap_data.  */
+   n->dynamic_refcount = 1;
+  else
+   n->dynamic_refcount = 0;
 }
   else if (n->dynamic_refcount)
 {
-  if (n->refcount != REFCOUNT_INFINITY)
+  if (n->refcount != REFCOUNT_INFINITY
+   

[PATCH, OpenACC 2.7] Implement self clause for compute constructs

2023-06-13 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,
This patch implements the compiler side for the 'self' clause for compute 
constructs:
parallel, kernels, and serial.

As you know, the actual "local device" device type for libgomp is not yet 
implemented,
so the libgomp side is basically just a simple duplicate of what host-fallback 
is doing,
though everything else should be completed by this patch.

Tested on powerpc64le-linux/nvptx, x64_64-linux/amdgcn tests pending.
Is this okay for trunk?

Thanks,
Chung-Lin

2023-06-13  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-parser.cc (c_parser_oacc_compute_clause_self): New function.
(c_parser_oacc_all_clauses): Add new 'bool compute_p = false'
parameter, add parsing of self clause when compute_p is true.
(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSE_MASK): Likewise,
(OACC_SERIAL_CLAUSE_MASK): Likewise.
(c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
set compute_p argument to true.
* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_compute_clause_self): New function.
(cp_parser_oacc_all_clauses): Add new 'bool compute_p = false'
parameter, add parsing of self clause when compute_p is true.
(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSE_MASK): Likewise,
(OACC_SERIAL_CLAUSE_MASK): Likewise.
(cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
set compute_p argument to true.
* pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case.
* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged
with OMP_CLAUSE_IF case.

gcc/fortran/ChangeLog:

* gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field.
* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF.
(gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF.
(OACC_KERNELS_CLAUSES): Likewise.
(OACC_SERIAL_CLAUSES): Likewise.
(resolve_omp_clauses): Add handling for omp_clauses->self_expr.
* trans-openmp.cc (gfc_trans_omp_clauses): Add handling of
clauses->self_expr and building of OMP_CLAUSE_SELF tree clause.
(gfc_split_omp_clauses): Add handling of self_expr field copy.

gcc/ChangeLog:

* gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case.
(gimplify_adjust_omp_clauses): Likewise.
* omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code,
* omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum.
* tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF
case.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE_SELF_EXPR): New macro.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/self-clause-1.c: New test.
* c-c++-common/goacc/self-clause-2.c: New test.
* gfortran.dg/goacc/self.f95: New test.

include/ChangeLog:

* gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value.

libgomp/ChangeLog:

* oacc-parallel.c (GOACC_parallel_keyed): Add code to handle
GOACC_FLAG_LOCAL_DEVICE case.
* testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.From 449883981c8e1f707b47ff8f8dd70049b9ffda82 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 13 Jun 2023 08:44:31 -0700
Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs

This patch implements the 'self' clause for compute constructs: parallel,
kernels, and serial. This clause conditionally uses the local device
(the host mult-core CPU) as the executing device of the compute region.

The actual implementation of the "local device" device type inside libgomp
(presumably using pthreads) is still not yet completed, so the libgomp
side is still implemented the exact same as host-fallback mode. (so as of now,
it essentially behaves like the 'if' clause with the condition inverted)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_oacc_compute_clause_self): New function.
(c_parser_oacc_all_clauses): Add new 'bool compute_p = false'
parameter, add parsing of self clause when compute_p is true.
(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSE_MASK): Likewise,
(OACC_SERIAL_CLAUSE_MASK): Likewise.
(c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
set compute_p argument to true.
* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case.

gcc/cp/Chang

[PATCH, OpenACC 2.7] Implement default clause support for data constructs

2023-06-06 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,
this patch implements the OpenACC 2.7 addition of default(none|present) support
for data constructs.

Apart from adjusting the front-ends for allowed clauses masks (for acc data),
mostly implemented in gimplify.

Tested on powerpc64le-linux/nvptx, x86_64-linux/amdgcn tests in progress (expect
no surprises). Is this okay for trunk?

Thanks,
Chung-Lin
gcc/c/ChangeLog:

* c-parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/cp/ChangeLog:

* parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/fortran/ChangeLog:

* openmp.cc (OACC_DATA_CLAUSES): Add OMP_CLAUSE_DEFAULT.

gcc/ChangeLog:

* gimplify.cc (struct gimplify_omp_ctx): Add oacc_data_default_kind
field.
(new_omp_context): Initialize oacc_data_default_kind field.
(gimplify_scan_omp_clauses): Set oacc_data_default_kind for data
constructs. Set ctx->default_kind for compute constructs from
ctx->oacc_data_default_kind.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/default-3.c: Adjust testcase.
* c-c++-common/goacc/default-5.c: Adjust testcase.
* gfortran.dg/goacc/default-3.f95: Adjust testcase.
* gfortran.dg/goacc/default-5.f: Adjust testcase.
From 101305aee9b27c6df00d7c403e469bdf8d7f45a4 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 6 Jun 2023 03:46:29 -0700
Subject: [PATCH 2/2] OpenACC 2.7: default clause support for data constructs

This patch implements the OpenACC 2.7 addition of default(none|present) support
for data constructs.

Now, specifying "default(none|present)" on a data construct turns on same
default clause behavior for all enclosed compute constructs (which don't
already themselves have a default clause).

gcc/c/ChangeLog:

* c-parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/cp/ChangeLog:

* parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/fortran/ChangeLog:

* openmp.cc (OACC_DATA_CLAUSES): Add OMP_CLAUSE_DEFAULT.

gcc/ChangeLog:

* gimplify.cc (struct gimplify_omp_ctx): Add oacc_data_default_kind
field.
(new_omp_context): Initialize oacc_data_default_kind field.
(gimplify_scan_omp_clauses): Set oacc_data_default_kind for data
constructs. Set ctx->default_kind for compute constructs from
ctx->oacc_data_default_kind.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/default-3.c: Adjust testcase.
* c-c++-common/goacc/default-5.c: Adjust testcase.
* gfortran.dg/goacc/default-3.f95: Adjust testcase.
* gfortran.dg/goacc/default-5.f: Adjust testcase.
---
 gcc/c/c-parser.cc |  1 +
 gcc/cp/parser.cc  |  1 +
 gcc/fortran/openmp.cc |  3 ++-
 gcc/gimplify.cc   | 20 +++
 gcc/testsuite/c-c++-common/goacc/default-3.c  | 15 +-
 gcc/testsuite/c-c++-common/goacc/default-5.c  | 18 +++--
 gcc/testsuite/gfortran.dg/goacc/default-3.f95 | 15 ++
 gcc/testsuite/gfortran.dg/goacc/default-5.f   | 17 ++--
 8 files changed, 84 insertions(+), 6 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index b61aef8b1a2..645d28b320d 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -18133,6 +18133,7 @@ c_parser_oacc_cache (location_t loc, c_parser *parser)
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)  \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)   \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NO_CREATE)   \
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index dd7638f1c93..4b4df29a406 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -45759,6 +45759,7 @@ cp_parser_oacc_cache (cp_parser *parser, cp_token 
*pragma_tok)
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)  \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DETACH)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)   \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)  \
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 4c30548567f..b785e71f20f 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -3645,7 +3645,8 @@ 

[PATCH, OpenACC 2.7] Implement host_data must have use_device clause requirement

2023-06-06 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,
this patch implements the OpenACC 2.7 change requiring the host_data construct
to have at least one use_device clause.

This patch started out with a simple check during gimplify (much smaller patch),
but turned out that front-ends removed use_device clauses when they have error,
and the gimplify check started to echo a "no use_device clause" message in such
cases, which seem confusing for the user. So ended up adding the check in each
front-end instead.

Tested on powerpc64le-linux/nvptx, x86_64-linux/amdgcn tests in progress (expect
no surprises). Is this okay for trunk?

Thanks,
Chung-Lin

gcc/c/ChangeLog:

* c-parser.cc (c_parser_oacc_host_data): Add checking requiring OpenACC
host_data construct to have an use_device clause.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_host_data): Add checking requiring OpenACC
host_data construct to have an use_device clause.

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_trans_oacc_construct): Add checking requiring
OpenACC host_data construct to have an use_device clause.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/host_data-2.c: Adjust testcase.
* gfortran.dg/goacc/host_data-error.f90: New testcase.
* gfortran.dg/goacc/pr71704.f90: Adjust testcase.
From 0d17b8d24fa6079d6c289305e9644c3fecd429f1 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 6 Jun 2023 03:19:33 -0700
Subject: [PATCH 1/2] OpenACC 2.7: host_data must have use_device clause
 requirement

This patch implements the OpenACC 2.7 change requiring the host_data construct
to have at least one use_device clause.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_oacc_host_data): Add checking requiring OpenACC
host_data construct to have an use_device clause.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_host_data): Add checking requiring OpenACC
host_data construct to have an use_device clause.

gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_trans_oacc_construct): Add checking requiring
OpenACC host_data construct to have an use_device clause.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/host_data-2.c: Adjust testcase.
* gfortran.dg/goacc/host_data-error.f90: New testcase.
* gfortran.dg/goacc/pr71704.f90: Adjust testcase.
---
 gcc/c/c-parser.cc   |  9 +++--
 gcc/cp/parser.cc| 11 +--
 gcc/fortran/trans-openmp.cc |  6 ++
 gcc/testsuite/c-c++-common/goacc/host_data-2.c  |  7 ++-
 gcc/testsuite/gfortran.dg/goacc/host_data-error.f90 |  6 ++
 gcc/testsuite/gfortran.dg/goacc/pr71704.f90 |  5 +++--
 6 files changed, 37 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/host_data-error.f90

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 5baa501dbee..b61aef8b1a2 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -18398,8 +18398,13 @@ c_parser_oacc_host_data (location_t loc, c_parser 
*parser, bool *if_p)
   tree stmt, clauses, block;
 
   clauses = c_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK,
-  "#pragma acc host_data");
-
+  "#pragma acc host_data", false);
+  if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR))
+{
+  error_at (loc, "% construct requires % clause");
+  return error_mark_node;
+}
+  clauses = c_finish_omp_clauses (clauses, C_ORT_ACC);
   block = c_begin_omp_parallel ();
   add_stmt (c_parser_omp_structured_block (parser, if_p));
   stmt = c_finish_oacc_host_data (loc, clauses, block);
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 1c9aa671851..dd7638f1c93 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -45798,8 +45798,15 @@ cp_parser_oacc_host_data (cp_parser *parser, cp_token 
*pragma_tok, bool *if_p)
   unsigned int save;
 
   clauses = cp_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK,
-   "#pragma acc host_data", pragma_tok);
-
+   "#pragma acc host_data", pragma_tok,
+   false);
+  if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR))
+{
+  error_at (pragma_tok->location,
+   "% construct requires % clause");
+  return error_mark_node;
+}
+  clauses = finish_omp_clauses (clauses, C_ORT_ACC);
   block = begin_omp_parallel ();
   save = cp_parser_begin_omp_structured_block (parser);
   cp_parser_statement (parser, NULL_TREE, false, if_p);
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 42b608f3d36..5e0079cce76 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -4677,6 +4677,12 @@ gfc_trans_oacc_construct (gfc_code *code)
break;
   case EXEC_

Re: nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case (was: [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes)

2023-01-13 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,

On 2023/1/12 9:51 PM, Thomas Schwinge wrote:
> In my case, 'cuda_callback_wrapper' (expectedly) gets invoked with
> 'res != CUDA_SUCCESS' ("an illegal memory access was encountered").
> When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device
> (..., which deadlocks); that's generally problematic: per
> https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483
> "'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls".

I remember running into this myself when first creating this async support
(IIRC in my case it was cuFree()-ing something) yet you've found another 
mistake here! :) 

> Given that eventually we must reach a host/device synchronization point
> (latest when the device is shut down at program termination), and the
> non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to
> replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error' as per the
> "nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case"
> attached.  OK to push?

I think this patch is fine. Actual approval powers are your's or Tom's :)

> 
> (Might we even skip 'GOMP_PLUGIN_error' here, understanding that the
> error will be caught and reported at the next host/device synchronization
> point?  But I've not verified that.)

Actually, the CUDA driver API docs are a bit vague on what exactly this
CUresult arg to the callback actually means. The 'res != CUDA_SUCCESS' handling
here was basically just generic handling. I am not really sure what is the
true right thing to do here (is the error still retained by CUDA after the 
callback
completes?)

Chung-Lin



[Ping x6] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-12-12 Thread Chung-Lin Tang via Gcc-patches
Ping x6

On 2022/12/6 12:21 AM, Chung-Lin Tang wrote:
> Ping x5
> 
> On 2022/11/22 12:24 上午, Chung-Lin Tang wrote:
>> Ping x4
>>
>> On 2022/11/8 12:34 AM, Chung-Lin Tang wrote:
>>> Ping x3.
>>>
>>> On 2022/10/31 10:18 PM, Chung-Lin Tang wrote:
>>>> Ping x2.
>>>>
>>>> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote:
>>>>> Ping.
>>>>>
>>>>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote:
>>>>>> Hi Tom,
>>>>>> I had a patch submitted earlier, where I reported that the current way 
>>>>>> of implementing
>>>>>> barriers in libgomp on nvptx created a quite significant performance 
>>>>>> drop on some SPEChpc2021
>>>>>> benchmarks:
>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html 
>>>>>> That previous patch wasn't accepted well (admittedly, it was kind of a 
>>>>>> hack).
>>>>>> So in this patch, I tried to (mostly) re-implement team-barriers for 
>>>>>> NVPTX.
>>>>>>
>>>>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things 
>>>>>> that it isn't suited for,
>>>>>> barriers are implemented simplistically with bar.* synchronization 
>>>>>> instructions.
>>>>>> Tasks are processed after threads have joined, and only if 
>>>>>> team->task_count != 0
>>>>>>
>>>>>> (arguably, there might be a little bit of performance forfeited where 
>>>>>> earlier arriving threads
>>>>>> could've been used to process tasks ahead of other threads. But that 
>>>>>> again falls into requiring
>>>>>> implementing complex futex-wait/wake like behavior. Really, that kind of 
>>>>>> tasking is not what target
>>>>>> offloading is usually used for)
>>>>>>
>>>>>> Implementation highlight notes:
>>>>>> 1. gomp_team_barrier_wake() is now an empty function (threads never 
>>>>>> "wake" in the usual manner)
>>>>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
>>>>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
>>>>>>
>>>>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
>>>>>> The main synchronization is done using a 'bar.red' instruction. This 
>>>>>> reduces across all threads
>>>>>> the condition (team->task_count != 0), to enable the task processing 
>>>>>> down below if any thread
>>>>>> created a task. (this bar.red usage required the need of the second 
>>>>>> GCC patch in this series)
>>>>>>
>>>>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, 
>>>>>> using libgomp, ovo, omptests,
>>>>>> and sollve_vv testsuites, all without regressions. Also verified that 
>>>>>> the SPEChpc 2021 521.miniswp_t
>>>>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 
>>>>>> cycle has been restored to
>>>>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?
>>>>>>
>>>>>> (also suggest backporting to GCC12 branch, if performance regression can 
>>>>>> be considered a defect)
>>>>>>
>>>>>> Thanks,
>>>>>> Chung-Lin
>>>>>>
>>>>>> libgomp/ChangeLog:
>>>>>>
>>>>>> 2022-09-21  Chung-Lin Tang  
>>>>>>
>>>>>>  * config/nvptx/bar.c (generation_to_barrier): Remove.
>>>>>>  (futex_wait,futex_wake,do_spin,do_wait): Remove.
>>>>>>  (GOMP_WAIT_H): Remove.
>>>>>>  (#include "../linux/bar.c"): Remove.
>>>>>>  (gomp_barrier_wait_end): New function.
>>>>>>  (gomp_barrier_wait): Likewise.
>>>>>>  (gomp_barrier_wait_last): Likewise.
>>>>>>  (gomp_team_barrier_wait_end): Likewise.
>>>>>>  (gomp_team_barrier_wait): Likewise.
>>>>>>  (gomp_team_barrier_wait_final): Likewise.
>>>>>>  (gomp_team_barrier_wait_cancel_end): Likewise.
>>>>>>  (gomp_team_barrier_wait_cancel): Likewise.
>>>>>>  (gomp_team_barrier_cancel): Likewise.
>>>>>>  * config/nvptx/bar.h (gomp_team_barrier_wake): Remove
>>>>>>  prototype, add new static inline function.
>>>
>>
> 



[Ping x5] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-12-05 Thread Chung-Lin Tang via Gcc-patches
Ping x5

On 2022/11/22 12:24 上午, Chung-Lin Tang wrote:
> Ping x4
> 
> On 2022/11/8 12:34 AM, Chung-Lin Tang wrote:
>> Ping x3.
>>
>> On 2022/10/31 10:18 PM, Chung-Lin Tang wrote:
>>> Ping x2.
>>>
>>> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote:
>>>> Ping.
>>>>
>>>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote:
>>>>> Hi Tom,
>>>>> I had a patch submitted earlier, where I reported that the current way of 
>>>>> implementing
>>>>> barriers in libgomp on nvptx created a quite significant performance drop 
>>>>> on some SPEChpc2021
>>>>> benchmarks:
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html>>>>>>
>>>>>  That previous patch wasn't accepted well (admittedly, it was kind of a 
>>>>> hack).
>>>>> So in this patch, I tried to (mostly) re-implement team-barriers for 
>>>>> NVPTX.
>>>>>
>>>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things 
>>>>> that it isn't suited for,
>>>>> barriers are implemented simplistically with bar.* synchronization 
>>>>> instructions.
>>>>> Tasks are processed after threads have joined, and only if 
>>>>> team->task_count != 0
>>>>>
>>>>> (arguably, there might be a little bit of performance forfeited where 
>>>>> earlier arriving threads
>>>>> could've been used to process tasks ahead of other threads. But that 
>>>>> again falls into requiring
>>>>> implementing complex futex-wait/wake like behavior. Really, that kind of 
>>>>> tasking is not what target
>>>>> offloading is usually used for)
>>>>>
>>>>> Implementation highlight notes:
>>>>> 1. gomp_team_barrier_wake() is now an empty function (threads never 
>>>>> "wake" in the usual manner)
>>>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
>>>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
>>>>>
>>>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
>>>>> The main synchronization is done using a 'bar.red' instruction. This 
>>>>> reduces across all threads
>>>>> the condition (team->task_count != 0), to enable the task processing 
>>>>> down below if any thread
>>>>>     created a task. (this bar.red usage required the need of the second 
>>>>> GCC patch in this series)
>>>>>
>>>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, 
>>>>> using libgomp, ovo, omptests,
>>>>> and sollve_vv testsuites, all without regressions. Also verified that the 
>>>>> SPEChpc 2021 521.miniswp_t
>>>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 
>>>>> cycle has been restored to
>>>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?
>>>>>
>>>>> (also suggest backporting to GCC12 branch, if performance regression can 
>>>>> be considered a defect)
>>>>>
>>>>> Thanks,
>>>>> Chung-Lin
>>>>>
>>>>> libgomp/ChangeLog:
>>>>>
>>>>> 2022-09-21  Chung-Lin Tang  
>>>>>
>>>>>   * config/nvptx/bar.c (generation_to_barrier): Remove.
>>>>>   (futex_wait,futex_wake,do_spin,do_wait): Remove.
>>>>>   (GOMP_WAIT_H): Remove.
>>>>>   (#include "../linux/bar.c"): Remove.
>>>>>   (gomp_barrier_wait_end): New function.
>>>>>   (gomp_barrier_wait): Likewise.
>>>>>   (gomp_barrier_wait_last): Likewise.
>>>>>   (gomp_team_barrier_wait_end): Likewise.
>>>>>   (gomp_team_barrier_wait): Likewise.
>>>>>   (gomp_team_barrier_wait_final): Likewise.
>>>>>   (gomp_team_barrier_wait_cancel_end): Likewise.
>>>>>   (gomp_team_barrier_wait_cancel): Likewise.
>>>>>   (gomp_team_barrier_cancel): Likewise.
>>>>>   * config/nvptx/bar.h (gomp_team_barrier_wake): Remove
>>>>>   prototype, add new static inline function.
>>
> 



[Ping x4] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-11-21 Thread Chung-Lin Tang via Gcc-patches
Ping x4

On 2022/11/8 12:34 AM, Chung-Lin Tang wrote:
> Ping x3.
> 
> On 2022/10/31 10:18 PM, Chung-Lin Tang wrote:
>> Ping x2.
>>
>> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote:
>>> Ping.
>>>
>>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote:
>>>> Hi Tom,
>>>> I had a patch submitted earlier, where I reported that the current way of 
>>>> implementing
>>>> barriers in libgomp on nvptx created a quite significant performance drop 
>>>> on some SPEChpc2021
>>>> benchmarks:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html>>>>>
>>>> That previous patch wasn't accepted well (admittedly, it was kind of a 
>>>> hack).
>>>> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.
>>>>
>>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things 
>>>> that it isn't suited for,
>>>> barriers are implemented simplistically with bar.* synchronization 
>>>> instructions.
>>>> Tasks are processed after threads have joined, and only if 
>>>> team->task_count != 0
>>>>
>>>> (arguably, there might be a little bit of performance forfeited where 
>>>> earlier arriving threads
>>>> could've been used to process tasks ahead of other threads. But that again 
>>>> falls into requiring
>>>> implementing complex futex-wait/wake like behavior. Really, that kind of 
>>>> tasking is not what target
>>>> offloading is usually used for)
>>>>
>>>> Implementation highlight notes:
>>>> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" 
>>>> in the usual manner)
>>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
>>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
>>>>
>>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
>>>> The main synchronization is done using a 'bar.red' instruction. This 
>>>> reduces across all threads
>>>> the condition (team->task_count != 0), to enable the task processing 
>>>> down below if any thread
>>>> created a task. (this bar.red usage required the need of the second 
>>>> GCC patch in this series)
>>>>
>>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, 
>>>> using libgomp, ovo, omptests,
>>>> and sollve_vv testsuites, all without regressions. Also verified that the 
>>>> SPEChpc 2021 521.miniswp_t
>>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle 
>>>> has been restored to
>>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?
>>>>
>>>> (also suggest backporting to GCC12 branch, if performance regression can 
>>>> be considered a defect)
>>>>
>>>> Thanks,
>>>> Chung-Lin
>>>>
>>>> libgomp/ChangeLog:
>>>>
>>>> 2022-09-21  Chung-Lin Tang  
>>>>
>>>>* config/nvptx/bar.c (generation_to_barrier): Remove.
>>>>(futex_wait,futex_wake,do_spin,do_wait): Remove.
>>>>(GOMP_WAIT_H): Remove.
>>>>(#include "../linux/bar.c"): Remove.
>>>>(gomp_barrier_wait_end): New function.
>>>>(gomp_barrier_wait): Likewise.
>>>>(gomp_barrier_wait_last): Likewise.
>>>>(gomp_team_barrier_wait_end): Likewise.
>>>>(gomp_team_barrier_wait): Likewise.
>>>>(gomp_team_barrier_wait_final): Likewise.
>>>>(gomp_team_barrier_wait_cancel_end): Likewise.
>>>>(gomp_team_barrier_wait_cancel): Likewise.
>>>>(gomp_team_barrier_cancel): Likewise.
>>>>* config/nvptx/bar.h (gomp_team_barrier_wake): Remove
>>>>prototype, add new static inline function.
> 



[Ping x3] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-11-07 Thread Chung-Lin Tang via Gcc-patches
Ping x3.

On 2022/10/31 10:18 PM, Chung-Lin Tang wrote:
> Ping x2.
> 
> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote:
>> Ping.
>>
>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote:
>>> Hi Tom,
>>> I had a patch submitted earlier, where I reported that the current way of 
>>> implementing
>>> barriers in libgomp on nvptx created a quite significant performance drop 
>>> on some SPEChpc2021
>>> benchmarks:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html
>>>
>>> That previous patch wasn't accepted well (admittedly, it was kind of a 
>>> hack).
>>> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.
>>>
>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things 
>>> that it isn't suited for,
>>> barriers are implemented simplistically with bar.* synchronization 
>>> instructions.
>>> Tasks are processed after threads have joined, and only if team->task_count 
>>> != 0
>>>
>>> (arguably, there might be a little bit of performance forfeited where 
>>> earlier arriving threads
>>> could've been used to process tasks ahead of other threads. But that again 
>>> falls into requiring
>>> implementing complex futex-wait/wake like behavior. Really, that kind of 
>>> tasking is not what target
>>> offloading is usually used for)
>>>
>>> Implementation highlight notes:
>>> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" 
>>> in the usual manner)
>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
>>>
>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
>>> The main synchronization is done using a 'bar.red' instruction. This 
>>> reduces across all threads
>>> the condition (team->task_count != 0), to enable the task processing 
>>> down below if any thread
>>> created a task. (this bar.red usage required the need of the second GCC 
>>> patch in this series)
>>>
>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, 
>>> using libgomp, ovo, omptests,
>>> and sollve_vv testsuites, all without regressions. Also verified that the 
>>> SPEChpc 2021 521.miniswp_t
>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle 
>>> has been restored to
>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?
>>>
>>> (also suggest backporting to GCC12 branch, if performance regression can be 
>>> considered a defect)
>>>
>>> Thanks,
>>> Chung-Lin
>>>
>>> libgomp/ChangeLog:
>>>
>>> 2022-09-21  Chung-Lin Tang  
>>>
>>> * config/nvptx/bar.c (generation_to_barrier): Remove.
>>> (futex_wait,futex_wake,do_spin,do_wait): Remove.
>>> (GOMP_WAIT_H): Remove.
>>> (#include "../linux/bar.c"): Remove.
>>> (gomp_barrier_wait_end): New function.
>>> (gomp_barrier_wait): Likewise.
>>> (gomp_barrier_wait_last): Likewise.
>>> (gomp_team_barrier_wait_end): Likewise.
>>> (gomp_team_barrier_wait): Likewise.
>>> (gomp_team_barrier_wait_final): Likewise.
>>> (gomp_team_barrier_wait_cancel_end): Likewise.
>>> (gomp_team_barrier_wait_cancel): Likewise.
>>> (gomp_team_barrier_cancel): Likewise.
>>> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove
>>> prototype, add new static inline function.



[Ping x2] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-10-31 Thread Chung-Lin Tang
Ping x2.

On 2022/10/17 10:29 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote:
>> Hi Tom,
>> I had a patch submitted earlier, where I reported that the current way of 
>> implementing
>> barriers in libgomp on nvptx created a quite significant performance drop on 
>> some SPEChpc2021
>> benchmarks:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html
>>
>> That previous patch wasn't accepted well (admittedly, it was kind of a hack).
>> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.
>>
>> Basically, instead of trying to have the GPU do CPU-with-OS-like things that 
>> it isn't suited for,
>> barriers are implemented simplistically with bar.* synchronization 
>> instructions.
>> Tasks are processed after threads have joined, and only if team->task_count 
>> != 0
>>
>> (arguably, there might be a little bit of performance forfeited where 
>> earlier arriving threads
>> could've been used to process tasks ahead of other threads. But that again 
>> falls into requiring
>> implementing complex futex-wait/wake like behavior. Really, that kind of 
>> tasking is not what target
>> offloading is usually used for)
>>
>> Implementation highlight notes:
>> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" 
>> in the usual manner)
>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
>>
>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
>> The main synchronization is done using a 'bar.red' instruction. This 
>> reduces across all threads
>> the condition (team->task_count != 0), to enable the task processing 
>> down below if any thread
>> created a task. (this bar.red usage required the need of the second GCC 
>> patch in this series)
>>
>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, 
>> using libgomp, ovo, omptests,
>> and sollve_vv testsuites, all without regressions. Also verified that the 
>> SPEChpc 2021 521.miniswp_t
>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle 
>> has been restored to
>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?
>>
>> (also suggest backporting to GCC12 branch, if performance regression can be 
>> considered a defect)
>>
>> Thanks,
>> Chung-Lin
>>
>> libgomp/ChangeLog:
>>
>> 2022-09-21  Chung-Lin Tang  
>>
>>  * config/nvptx/bar.c (generation_to_barrier): Remove.
>>  (futex_wait,futex_wake,do_spin,do_wait): Remove.
>>  (GOMP_WAIT_H): Remove.
>>  (#include "../linux/bar.c"): Remove.
>>  (gomp_barrier_wait_end): New function.
>>  (gomp_barrier_wait): Likewise.
>>  (gomp_barrier_wait_last): Likewise.
>>  (gomp_team_barrier_wait_end): Likewise.
>>  (gomp_team_barrier_wait): Likewise.
>>  (gomp_team_barrier_wait_final): Likewise.
>>  (gomp_team_barrier_wait_cancel_end): Likewise.
>>  (gomp_team_barrier_wait_cancel): Likewise.
>>  (gomp_team_barrier_cancel): Likewise.
>>  * config/nvptx/bar.h (gomp_team_barrier_wake): Remove
>>  prototype, add new static inline function.


Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-10-17 Thread Chung-Lin Tang
Ping.

On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote:
> Hi Tom,
> I had a patch submitted earlier, where I reported that the current way of 
> implementing
> barriers in libgomp on nvptx created a quite significant performance drop on 
> some SPEChpc2021
> benchmarks:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html
> 
> That previous patch wasn't accepted well (admittedly, it was kind of a hack).
> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.
> 
> Basically, instead of trying to have the GPU do CPU-with-OS-like things that 
> it isn't suited for,
> barriers are implemented simplistically with bar.* synchronization 
> instructions.
> Tasks are processed after threads have joined, and only if team->task_count 
> != 0
> 
> (arguably, there might be a little bit of performance forfeited where earlier 
> arriving threads
> could've been used to process tasks ahead of other threads. But that again 
> falls into requiring
> implementing complex futex-wait/wake like behavior. Really, that kind of 
> tasking is not what target
> offloading is usually used for)
> 
> Implementation highlight notes:
> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in 
> the usual manner)
> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
> 
> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
> The main synchronization is done using a 'bar.red' instruction. This 
> reduces across all threads
> the condition (team->task_count != 0), to enable the task processing down 
> below if any thread
> created a task. (this bar.red usage required the need of the second GCC 
> patch in this series)
> 
> This patch has been tested on x86_64/powerpc64le with nvptx offloading, using 
> libgomp, ovo, omptests,
> and sollve_vv testsuites, all without regressions. Also verified that the 
> SPEChpc 2021 521.miniswp_t
> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle 
> has been restored to
> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?
> 
> (also suggest backporting to GCC12 branch, if performance regression can be 
> considered a defect)
> 
> Thanks,
> Chung-Lin
> 
> libgomp/ChangeLog:
> 
> 2022-09-21  Chung-Lin Tang  
> 
>   * config/nvptx/bar.c (generation_to_barrier): Remove.
>   (futex_wait,futex_wake,do_spin,do_wait): Remove.
>   (GOMP_WAIT_H): Remove.
>   (#include "../linux/bar.c"): Remove.
>   (gomp_barrier_wait_end): New function.
>   (gomp_barrier_wait): Likewise.
>   (gomp_barrier_wait_last): Likewise.
>   (gomp_team_barrier_wait_end): Likewise.
>   (gomp_team_barrier_wait): Likewise.
>   (gomp_team_barrier_wait_final): Likewise.
>   (gomp_team_barrier_wait_cancel_end): Likewise.
>   (gomp_team_barrier_wait_cancel): Likewise.
>   (gomp_team_barrier_cancel): Likewise.
>   * config/nvptx/bar.h (gomp_team_barrier_wake): Remove
>   prototype, add new static inline function.


Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-09-21 Thread Chung-Lin Tang via Gcc-patches




On 2022/9/21 5:01 PM, Jakub Jelinek wrote:

On Wed, Sep 21, 2022 at 03:45:36PM +0800, Chung-Lin Tang via Gcc-patches wrote:

Hi Tom,
I had a patch submitted earlier, where I reported that the current way of 
implementing
barriers in libgomp on nvptx created a quite significant performance drop on 
some SPEChpc2021
benchmarks:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html

That previous patch wasn't accepted well (admittedly, it was kind of a hack).
So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.

Basically, instead of trying to have the GPU do CPU-with-OS-like things that it 
isn't suited for,
barriers are implemented simplistically with bar.* synchronization instructions.
Tasks are processed after threads have joined, and only if team->task_count != 0

(arguably, there might be a little bit of performance forfeited where earlier 
arriving threads
could've been used to process tasks ahead of other threads. But that again 
falls into requiring
implementing complex futex-wait/wake like behavior. Really, that kind of 
tasking is not what target
offloading is usually used for)


I admit I don't have a good picture if people in real-world actually use
tasking in offloading regions and how much and in what way, but the above
definitely would be a show-stopper for typical tasking workloads, where
one thread (usually from master/masked/single construct's body) creates lots
of tasks and can spend considerable amount of time in those preparations,
while other threads are expected to handle those tasks.


I think the most common use case for target offloading is "parallel for".

Really, not simply removing tasking altogether from target regions in the 
specification is just looking for trouble.

If asynchronous offloaded tasks are to be supported, something at the whole GPU 
offload region level
is much more reasonable, like the async clause functionality in OpenACC.


Do we have an idea how are other implementations handling this?
I think it should be easily observable with atomics, have
master/masked/single that creates lots of tasks and then spends a long time
doing something, have very small task bodies that just increment some atomic
counter and at the end of the master/masked/single see how many tasks were
already encountered.


This could be an interesting test...


Note, I don't have any smart ideas how to handle this instead and what
you posted might be ok for what people usually do on offloading targets
in OpenMP if they use tasking at all, just wanted to mention that there
could be workloads where the above is a serious problem.  If there are
say hundreds of threads doing nothing until a single thread reaches a
barrier and there are hundreds of pending tasks...


I think it might still be doable, just not in the very fine "wake one thread" 
style
that the Linux-based implementation was doing.


E.g. note we have that 64 pending task limit after which we start to
create undeferred tasks, so if we never start handling tasks until
one thread is done with them, that would mean the single thread
would create 64 deferred tasks and then handle all the others itself
making it even longer until the other tasks can deal with it.


Okay, thanks for reminding that.

Chung-Lin


[PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction support in GCC

2022-09-21 Thread Chung-Lin Tang via Gcc-patches

Hi Tom, following the first patch.

This new barrier implementation I posted in the first patch uses the 'bar.red' 
instruction.
Usually this could've been easily done with a single line of inline assembly. 
However I quickly
realized that because the NVPTX GCC port is implemented with all virtual 
general registers,
we don't have a register constraint usable to select "predicate registers".
Since bar.red uses predicate typed values, I can't create it directly using 
inline asm.

So it appears that the most simple way of accessing it is with a target builtin.
The attached patch adds bar.red instructions to the nvptx port, and 
__builtin_nvptx_bar_red_* builtins
to use it. The code should support all variations of bar.red (and, or, and popc 
operations).

(This support was used to implement the first libgomp barrier patch, so must be 
approved together)

Thanks,
Chung-Lin

2022-09-21  Chung-Lin Tang  

gcc/ChangeLog:

* config/nvptx/nvptx.cc (nvptx_print_operand): Add 'p'
case, adjust comments.
(enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_RED_AND,
NVPTX_BUILTIN_BAR_RED_OR, and NVPTX_BUILTIN_BAR_RED_POPC.
(nvptx_expand_bar_red): New function.
(nvptx_init_builtins):
Add DEFs of __builtin_nvptx_bar_red_[and/or/popc].
(nvptx_expand_builtin): Use nvptx_expand_bar_red to expand
NVPTX_BUILTIN_BAR_RED_[AND/OR/POPC] cases.

* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_BARRED_AND, UNSPECV_BARRED_OR, and UNSPECV_BARRED_POPC.
(BARRED): New int iterator.
(barred_op,barred_mode,barred_ptxtype): New int attrs.
(nvptx_barred_): New define_insn.
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 49cc681..afc3a890 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -2879,6 +2879,7 @@ nvptx_mem_maybe_shared_p (const_rtx x)
t -- print a type opcode suffix, promoting QImode to 32 bits
T -- print a type size in bits
u -- print a type opcode suffix without promotions.
+   p -- print a '!' for constant 0.
x -- print a destination operand that may also be a bit bucket.  */
 
 static void
@@ -3012,6 +3013,11 @@ nvptx_print_operand (FILE *file, rtx x, int code)
   fprintf (file, "@!");
   goto common;
 
+case 'p':
+  if (INTVAL (x) == 0)
+   fprintf (file, "!");
+  break;
+
 case 'c':
   mode = GET_MODE (XEXP (x, 0));
   switch (x_code)
@@ -6151,9 +6157,90 @@ enum nvptx_builtins
   NVPTX_BUILTIN_CMP_SWAPLL,
   NVPTX_BUILTIN_MEMBAR_GL,
   NVPTX_BUILTIN_MEMBAR_CTA,
+  NVPTX_BUILTIN_BAR_RED_AND,
+  NVPTX_BUILTIN_BAR_RED_OR,
+  NVPTX_BUILTIN_BAR_RED_POPC,
   NVPTX_BUILTIN_MAX
 };
 
+/* Expander for 'bar.red' instruction builtins.  */
+
+static rtx
+nvptx_expand_bar_red (tree exp, rtx target,
+ machine_mode ARG_UNUSED (m), int ARG_UNUSED (ignore))
+{
+  int code = DECL_MD_FUNCTION_CODE (TREE_OPERAND (CALL_EXPR_FN (exp), 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
+
+  if (!target)
+target = gen_reg_rtx (mode);
+
+  rtx pred, dst;
+  rtx bar = expand_expr (CALL_EXPR_ARG (exp, 0),
+NULL_RTX, SImode, EXPAND_NORMAL);
+  rtx nthr = expand_expr (CALL_EXPR_ARG (exp, 1),
+ NULL_RTX, SImode, EXPAND_NORMAL);
+  rtx cpl = expand_expr (CALL_EXPR_ARG (exp, 2),
+NULL_RTX, SImode, EXPAND_NORMAL);
+  rtx redop = expand_expr (CALL_EXPR_ARG (exp, 3),
+  NULL_RTX, SImode, EXPAND_NORMAL);
+  if (CONST_INT_P (bar))
+{
+  if (INTVAL (bar) < 0 || INTVAL (bar) > 15)
+   {
+ error_at (EXPR_LOCATION (exp),
+   "barrier value must be within [0,15]");
+ return const0_rtx;
+   }
+}
+  else if (!REG_P (bar))
+bar = copy_to_mode_reg (SImode, bar);
+
+  if (!CONST_INT_P (nthr) && !REG_P (nthr))
+nthr = copy_to_mode_reg (SImode, nthr);
+
+  if (!CONST_INT_P (cpl))
+{
+  error_at (EXPR_LOCATION (exp),
+   "complement argument must be constant");
+  return const0_rtx;
+}
+
+  pred = gen_reg_rtx (BImode);
+  if (!REG_P (redop))
+redop = copy_to_mode_reg (SImode, redop);
+  emit_insn (gen_rtx_SET (pred, gen_rtx_NE (BImode, redop, GEN_INT (0;
+  redop = pred;
+
+  rtx pat;
+  switch (code)
+{
+case NVPTX_BUILTIN_BAR_RED_AND:
+  dst = gen_reg_rtx (BImode);
+  pat = gen_nvptx_barred_and (dst, bar, nthr, cpl, redop);
+  break;
+case NVPTX_BUILTIN_BAR_RED_OR:
+  dst = gen_reg_rtx (BImode);
+  pat = gen_nvptx_barred_or (dst, bar, nthr, cpl, redop);
+  break;
+case NVPTX_BUILTIN_BAR_RED_POPC:
+  dst = gen_reg_rtx (SImode);
+  pat = gen_nvptx_barred_popc (dst, bar, nthr, cpl, redop);
+  break;
+default:
+  gcc_unreachable ();
+}
+  emit_insn (pat);
+  if (GET_MODE (dst) == BImode)
+{
+  rt

[PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-09-21 Thread Chung-Lin Tang via Gcc-patches

Hi Tom,
I had a patch submitted earlier, where I reported that the current way of 
implementing
barriers in libgomp on nvptx created a quite significant performance drop on 
some SPEChpc2021
benchmarks:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html

That previous patch wasn't accepted well (admittedly, it was kind of a hack).
So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.

Basically, instead of trying to have the GPU do CPU-with-OS-like things that it 
isn't suited for,
barriers are implemented simplistically with bar.* synchronization instructions.
Tasks are processed after threads have joined, and only if team->task_count != 0

(arguably, there might be a little bit of performance forfeited where earlier 
arriving threads
could've been used to process tasks ahead of other threads. But that again 
falls into requiring
implementing complex futex-wait/wake like behavior. Really, that kind of 
tasking is not what target
offloading is usually used for)

Implementation highlight notes:
1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in 
the usual manner)
2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
3. gomp_barrier_wait_last() now is implemented using "bar.arrive"

4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
   The main synchronization is done using a 'bar.red' instruction. This reduces 
across all threads
   the condition (team->task_count != 0), to enable the task processing down 
below if any thread
   created a task. (this bar.red usage required the need of the second GCC 
patch in this series)

This patch has been tested on x86_64/powerpc64le with nvptx offloading, using 
libgomp, ovo, omptests,
and sollve_vv testsuites, all without regressions. Also verified that the 
SPEChpc 2021 521.miniswp_t
and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has 
been restored to
devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?

(also suggest backporting to GCC12 branch, if performance regression can be 
considered a defect)

Thanks,
Chung-Lin

libgomp/ChangeLog:

2022-09-21  Chung-Lin Tang  

* config/nvptx/bar.c (generation_to_barrier): Remove.
(futex_wait,futex_wake,do_spin,do_wait): Remove.
(GOMP_WAIT_H): Remove.
(#include "../linux/bar.c"): Remove.
(gomp_barrier_wait_end): New function.
(gomp_barrier_wait): Likewise.
(gomp_barrier_wait_last): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait): Likewise.
(gomp_team_barrier_wait_final): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
(gomp_team_barrier_wait_cancel): Likewise.
(gomp_team_barrier_cancel): Likewise.
* config/nvptx/bar.h (gomp_team_barrier_wake): Remove
prototype, add new static inline function.
diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c
index eee2107..0b958ed 100644
--- a/libgomp/config/nvptx/bar.c
+++ b/libgomp/config/nvptx/bar.c
@@ -30,137 +30,143 @@
 #include 
 #include "libgomp.h"
 
-/* For cpu_relax.  */
-#include "doacross.h"
-
-/* Assuming ADDR is >generation, return bar.  Copied from
-   rtems/bar.c.  */
+void
+gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
+{
+  if (__builtin_expect (state & BAR_WAS_LAST, 0))
+{
+  /* Next time we'll be awaiting TOTAL threads again.  */
+  bar->awaited = bar->total;
+  __atomic_store_n (>generation, bar->generation + BAR_INCR,
+   MEMMODEL_RELEASE);
+}
+  if (bar->total > 1)
+asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
+}
 
-static gomp_barrier_t *
-generation_to_barrier (int *addr)
+void
+gomp_barrier_wait (gomp_barrier_t *bar)
 {
-  char *bar
-= (char *) addr - __builtin_offsetof (gomp_barrier_t, generation);
-  return (gomp_barrier_t *)bar;
+  gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar));
 }
 
-/* Implement futex_wait-like behaviour to plug into the linux/bar.c
-   implementation.  Assumes ADDR is >generation.   */
+/* Like gomp_barrier_wait, except that if the encountering thread
+   is not the last one to hit the barrier, it returns immediately.
+   The intended usage is that a thread which intends to gomp_barrier_destroy
+   this barrier calls gomp_barrier_wait, while all other threads
+   call gomp_barrier_wait_last.  When gomp_barrier_wait returns,
+   the barrier can be safely destroyed.  */
 
-static inline void
-futex_wait (int *addr, int val)
+void
+gomp_barrier_wait_last (gomp_barrier_t *bar)
 {
-  gomp_barrier_t *bar = generation_to_barrier (addr);
+  /* The above described behavior matches 'bar.arrive' perfectly.  */
+  if (bar->total > 1)
+asm ("bar.arrive 1, %0;" : : "r" (32 * bar->total));
+}
 

[PING x2] Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule

2022-09-09 Thread Chung-Lin Tang



On 2022/8/26 4:15 PM, Chung-Lin Tang wrote:
> On 2022/8/4 9:31 PM, Koning, Paul wrote:
>>
>>
>>> On Aug 4, 2022, at 9:17 AM, Chung-Lin Tang  wrote:
>>>
>>> On 2022/6/28 10:06 PM, Jakub Jelinek wrote:
>>>> On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote:
>>>>> with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next:
>>>>>
>>>>> (1) chunk_size <= -1: wraps into large unsigned value, seems to work 
>>>>> though.
>>>>> (2) chunk_size == 0:  infinite loop
>>>>>
>>>>> The (2) behavior is obviously not desired. This patch fixes this by 
>>>>> changing
>>>> Why?  It is a user error, undefined behavior, we shouldn't slow down valid
>>>> code for users who don't bother reading the standard.
>>>
>>> This is loop init code, not per-iteration. The overhead really isn't that 
>>> much.
>>>
>>> The question should be, if GCC having infinite loop behavior is reasonable,
>>> even if it is undefined in the spec.
>>
>> I wouldn't think so.  The way I see "undefined code" is that you can't 
>> complain about "wrong code" produced by the compiler.  But for the compiler 
>> to malfunction on wrong input is an entirely differerent matter.  For one 
>> thing, it's hard to fix your code if the compiler fails.  How would you 
>> locate the offending source line?
>>
>>  paul
> 
> Ping?

Ping x2.


[PATCH] optc-save-gen.awk: adjust generated array compare

2022-09-08 Thread Chung-Lin Tang
Hi Joseph,
Jan-Benedict reported a build-bot error for the nios2 port under 
--enable-werror-always:

options-save.cc: In function 'bool cl_target_option_eq(const cl_target_option*, 
const cl_target_option*)':
options-save.cc:9291:38: error: comparison between two arrays 
[-Werror=array-compare]
 9291 |   if (ptr1->saved_custom_code_status != ptr2->saved_custom_code_status
  |   ~~~^
options-save.cc:9291:38: note: use unary '+' which decays operands to pointers 
or '&'component_ref' not supported by dump_decl[0] != 
&'component_ref' not supported by dump_decl[0]' to compare 
the addresses
options-save.cc:9294:37: error: comparison between two arrays 
[-Werror=array-compare]
 9294 |   if (ptr1->saved_custom_code_index != ptr2->saved_custom_code_index
  |   ~~^~~~
...

This is due to an array-typed TargetSave state in config/nios2/nios2.opt:
...
TargetSave
enum nios2_ccs_code saved_custom_code_status[256]

TargetSave
int saved_custom_code_index[256]
...


This patch adjusts the generated array state compare from 'ptr1->array' into 
'>array[0]' in gcc/optc-save-gen.awk,
seems sufficient to pass the tougher checks.

Tested by ensuring the compiler builds, which should be sufficient here.
Okay to commit to mainline?

Thanks,
Chung-Lin

* optc-save-gen.awk: Adjust array compare to use '>name[0]'
instead of 'ptr->name'.
diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index 233d1fbb637..27aabf2955e 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -1093,7 +1093,7 @@ for (i = 0; i < n_target_array; i++) {
name = var_target_array[i]
size = var_target_array_size[i]
type = var_target_array_type[i]
-   print "  if (ptr1->" name" != ptr2->" name "";
+   print "  if (>" name"[0] != >" name "[0]";
print "  || memcmp (ptr1->" name ", ptr2->" name ", " size " * 
sizeof(" type ")))"
print "return false;";
 }


[PATCH, nios2, committed] Add #undef of MUSL_DYNAMIC_LINKER

2022-09-08 Thread Chung-Lin Tang
This patch adds an #undef of MUSL_DYNAMIC_LINKER before its #define in 
config/nios2/linux.h.
This makes the nios2-linux build pass when the compiler is configured with 
--enable-werror-always.

Patch pushed to master at 0697bd070c4fffb33468976c93baff9493922fb3

Chung-LinFrom 0697bd070c4fffb33468976c93baff9493922fb3 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 8 Sep 2022 23:14:38 +0800
Subject: [PATCH] nios2: Add #undef of MUSL_DYNAMIC_LINKER

Add #undef of MUSL_DYNAMIC_LINKER before #define, to satisfy build checks
when configured with --enable-werror-always.

gcc/ChangeLog:

* config/nios2/linux.h (MUSL_DYNAMIC_LINKER): Add #undef before #define.
---
 gcc/config/nios2/linux.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/nios2/linux.h b/gcc/config/nios2/linux.h
index f5dd813acad..9e53dd657e4 100644
--- a/gcc/config/nios2/linux.h
+++ b/gcc/config/nios2/linux.h
@@ -30,6 +30,8 @@
 #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{pthread:-D_REENTRANT}"
 
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-nios2.so.1"
+
+#undef MUSL_DYNAMIC_LINKER
 #define MUSL_DYNAMIC_LINKER  "/lib/ld-musl-nios2.so.1"
 
 #undef LINK_SPEC
-- 
2.17.1



Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran

2022-09-05 Thread Chung-Lin Tang



On 2022/8/15 7:15 PM, Chung-Lin Tang wrote:

On 2022/8/15 7:06 PM, Chung-Lin Tang wrote:


I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran 
interacts, but it's
finally working. Again tested without regressions. Preparing to commit to 
devel/omp/gcc-12, and seeking
approval for mainline when the requires patches are in.


Just realized that I don't have the new testcases added in this patch.
Will supplement them later :P


Here's the USM allocator/libgfortran patch, with a libgomp.fortran testcase 
added.

Thanks,
Chung-Lin

2022-09-05  Chung-Lin Tang  

libgcc/
* Makefile.in (crtoffloadend$(objext)): Add $(PICFLAG) to compile rule.
* offloadstuff.c (GOMP_offload_register_ver): Add declaration of weak
symbol.
(__OFFLOAD_TABLE__): Likewise.
(init_non_offload): New function.

libgfortran/

* gfortran.map (GFORTRAN_13): New namespace.
(_gfortran_mem_allocators_init): New name inside GFORTRAN_13.
* libgfortran.h (mem_allocators_init): New exported declaration.
* runtime/main.c (do_init): Rename from init, add run-once guard code.
(cleanup): Add run-once guard code.
(GOMP_post_offload_register_callback): Declare weak symbol.
(GOMP_pre_gomp_target_fini_callback): Likewise.
(init): New constructor to register offload callbacks, or call do_init
when not OpenMP.
* runtime/memory.c (gfortran_malloc): New pointer variable.
(gfortran_calloc): Likewise.
(gfortran_realloc): Likewise.
(gfortran_free): Likewise.
(mem_allocators_init): New function.
(xmalloc): Use gfortran_malloc.
(xmallocarray): Use gfortran_malloc.
(xcalloc): Use gfortran_calloc.
(xrealloc): Use gfortran_realloc.
(xfree): Use gfortran_free.

libgomp/

* libgomp.map (GOMP_5.1.2): New version namespace.
(GOMP_post_offload_register_callback): New name inside GOMP_5.1.2.
(GOMP_pre_gomp_target_fini_callback): Likewise.
(GOMP_DEFINE_CALLBACK_SET): Macro to define callback set.
(post_offload_register): Define callback set for after offload image
register.
(pre_gomp_target_fini): Define callback set for before gomp_target_fini
is called.
(libgfortran_malloc_usm): New function.
(libgfortran_calloc_usm): Likewise
(libgfortran_realloc_usm): Likewise
(libgfortran_free_usm): Likewise.
(_gfortran_mem_allocators_init): Declare weak symbol.
(gomp_libgfortran_omp_allocators_init): New function.
(GOMP_offload_register_ver): Add handling of host_table == NULL, calling
into libgfortran to set unified_shared_memory allocators, and execution
of post_offload_register callbacks.
(gomp_target_init): Register all pre_gomp_target_fini callbacks to run
at end of main using atexit().

* testsuite/libgomp.fortran/target-unified_shared_memory-1.f90: New test.







diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 09b3ec8bc2e..70720cc910c 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -1045,8 +1045,9 @@ crtbeginT$(objext): $(srcdir)/crtstuff.c
 crtoffloadbegin$(objext): $(srcdir)/offloadstuff.c
$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN
 
+# crtoffloadend contains a constructor with calls to libgomp, so build as PIC.
 crtoffloadend$(objext): $(srcdir)/offloadstuff.c
-   $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END
+   $(crt_compile) $(CRTSTUFF_T_CFLAGS) $(PICFLAG) -c $< -DCRT_END
 
 crtoffloadtable$(objext): $(srcdir)/offloadstuff.c
$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_TABLE
diff --git a/libgcc/offloadstuff.c b/libgcc/offloadstuff.c
index 10e1fe19c8e..2edb6810021 100644
--- a/libgcc/offloadstuff.c
+++ b/libgcc/offloadstuff.c
@@ -63,6 +63,19 @@ const void *const __offload_vars_end[0]
   __attribute__ ((__used__, visibility ("hidden"),
  section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { };
 
+extern void GOMP_offload_register_ver (unsigned, const void *, int,
+  const void *);
+extern const void *const __OFFLOAD_TABLE__[0] __attribute__ ((weak));
+static void __attribute__((constructor))
+init_non_offload (void)
+{
+  /* If an OpenMP program has no offloading, post-offload_register callbacks
+ that need to run will require a call to GOMP_offload_register_ver, in
+ order to properly trigger those callbacks during init.  */
+  if (__OFFLOAD_TABLE__ == NULL)
+GOMP_offload_register_ver (0, NULL, 0, NULL);
+}
+
 #elif defined CRT_TABLE
 
 extern const void *const __offload_func_table[];
diff --git a/libgfortran/gfortran.map b/libgfortran/gfortran.map
index e0e795c3d48..55d2a529acd 100644
--- a/libgfortran/gfortran.map
+++ b/libgfortran/gfortran.map
@@ -1759,3 +1759,8 @@ GFORTRAN_12 {
   _gfortran_transfer_real128_write;
 #endif
 } GFORTRAN_10.2;
+
+GFORTRAN_13 {
+  global:
+  _gfortran_mem_allocators_init;
+} GFORTRAN_12;
diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h
index 0b893a51851..e518b3989cf 10

[OpenMP, nvptx] Use bar.sync/arrive for barriers when tasking is not used

2022-09-01 Thread Chung-Lin Tang
Hi, 
our work on SPEChpc2021 benchmarks show that, after the fix for PR99555 was 
committed:
[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=5ed77fb3ed1ee0289a0ec9499ef52b99b39421f1

while that patch fixed the hang, there were quite severe performance 
regressions caused
by this new barrier code. Under OpenMP target offload mode, Minisweep regressed 
by about 350%,
while HPGMG-FV was about 2x slower.

So the problem was presumably the new barriers, which replaced erroneous but 
fast bar.sync
instructions, with correct but really heavy-weight futex_wait/wake operations 
on the GPU.
This is probably required for preserving correct task vs. barrier behavior.

However, the observation is that: when tasks-related functionality are not used 
at all by
the team inside an OpenMP target region, and a barrier is just a place to wait 
for all
threads to rejoin (no problem of invoking waiting tasks to re-start) a barrier 
can in that
case be implemented by simple bar.sync and bar.arrive PTX instructions. That 
should be
able to recover most performance the cases that usually matter, e.g. 'omp 
parallel for' inside
'omp target'.

So the plan is to mark cases where 'tasks are never used'. This patch adds a 
'task_never_used'
flag inside struct gomp_team, initialized to true, and set to false when tasks 
are added to
the team. The nvptx specific gomp_team_barrier_wait_end routines can then use 
simple barrier
when team->task_never_used remains true on the barrier.

Some other cases, like the master/masked construct, and single construct, also 
needs to have
task_never_used set false; because these constructs inherently creates 
asymmetric loads where
only a subset of threads run through the region (which may or may not use 
tasking), there may
be the case where different threads wait at the end assuming different 
task_never_used cases.
For correctness, these constructs must have team->task_never_used 
conservatively marked false
at the start of the construct.

This patch has been divided into two: the first is the inlining of contents of 
config/linux/bar.c
into config/nvptx/bar.c (instead of an include). This is needed now because 
some parts of
gomp_team_barrier_wait_[cancel_]end now needs nvptx specific adjustments. The 
second contains
the above described changes.

Tested on powerpc64le-linux and x86_64-linux with nvptx offloading, seeking 
approval for trunk.

Thanks,
Chung-Lin

From c2fdc31880d2d040822e8abece015c29a6d7b472 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 1 Sep 2022 05:53:49 -0700
Subject: [PATCH 1/2] libgomp: inline config/linux/bar.c into
 config/nvptx/bar.c

Preparing to add nvptx specific modifications to gomp_team_barrier_wait_end,
et al., so change from using an #include of config/linux/bar.c
in config/nvptx/bar.c, to a full copy of the implementation.

2022-09-01  Chung-Lin Tang  

libgomp/ChangeLog:

* config/nvptx/bar.c: Adjust include of "../linux/bar.c" into an
inlining of contents of config/linux/bar.c,
---
 libgomp/config/nvptx/bar.c | 183 -
 1 file changed, 180 insertions(+), 3 deletions(-)

diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c
index eee2107..a850c22 100644
--- a/libgomp/config/nvptx/bar.c
+++ b/libgomp/config/nvptx/bar.c
@@ -161,6 +161,183 @@ static inline void do_wait (int *addr, int val)
 futex_wait (addr, val);
 }
 
-/* Reuse the linux implementation.  */
-#define GOMP_WAIT_H 1
-#include "../linux/bar.c"
+/* Below is based on the linux implementation.  */
+
+void
+gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
+{
+  if (__builtin_expect (state & BAR_WAS_LAST, 0))
+{
+  /* Next time we'll be awaiting TOTAL threads again.  */
+  bar->awaited = bar->total;
+  __atomic_store_n (>generation, bar->generation + BAR_INCR,
+   MEMMODEL_RELEASE);
+  futex_wake ((int *) >generation, INT_MAX);
+}
+  else
+{
+  do
+   do_wait ((int *) >generation, state);
+  while (__atomic_load_n (>generation, MEMMODEL_ACQUIRE) == state);
+}
+}
+
+void
+gomp_barrier_wait (gomp_barrier_t *bar)
+{
+  gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar));
+}
+
+/* Like gomp_barrier_wait, except that if the encountering thread
+   is not the last one to hit the barrier, it returns immediately.
+   The intended usage is that a thread which intends to gomp_barrier_destroy
+   this barrier calls gomp_barrier_wait, while all other threads
+   call gomp_barrier_wait_last.  When gomp_barrier_wait returns,
+   the barrier can be safely destroyed.  */
+
+void
+gomp_barrier_wait_last (gomp_barrier_t *bar)
+{
+  gomp_barrier_state_t state = gomp_barrier_wait_start (bar);
+  if (state & BAR_WAS_LAST)
+gomp_barrier_wait_end (bar, state);
+}
+
+void
+gomp_team_barrier_wake (gomp_barrier_t *bar, int count)
+{
+  futex_

[PING] Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule

2022-08-26 Thread Chung-Lin Tang

On 2022/8/4 9:31 PM, Koning, Paul wrote:




On Aug 4, 2022, at 9:17 AM, Chung-Lin Tang  wrote:

On 2022/6/28 10:06 PM, Jakub Jelinek wrote:

On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote:

with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next:

(1) chunk_size <= -1: wraps into large unsigned value, seems to work though.
(2) chunk_size == 0:  infinite loop

The (2) behavior is obviously not desired. This patch fixes this by changing

Why?  It is a user error, undefined behavior, we shouldn't slow down valid
code for users who don't bother reading the standard.


This is loop init code, not per-iteration. The overhead really isn't that much.

The question should be, if GCC having infinite loop behavior is reasonable,
even if it is undefined in the spec.


I wouldn't think so.  The way I see "undefined code" is that you can't complain about 
"wrong code" produced by the compiler.  But for the compiler to malfunction on wrong 
input is an entirely differerent matter.  For one thing, it's hard to fix your code if the compiler 
fails.  How would you locate the offending source line?

paul


Ping?


Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran

2022-08-15 Thread Chung-Lin Tang

On 2022/8/15 7:06 PM, Chung-Lin Tang wrote:


I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran 
interacts, but it's
finally working. Again tested without regressions. Preparing to commit to 
devel/omp/gcc-12, and seeking
approval for mainline when the requires patches are in.


Just realized that I don't have the new testcases added in this patch.
Will supplement them later :P

Thanks,
Chung-Lin


[PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran

2022-08-15 Thread Chung-Lin Tang

After the first libgfortran memory allocator preparation patch, this is the
actual patch that organizes unified_shared_memory allocation into libgfortran.

In the current OpenMP requires implementation, the requires_mask is collected
through offload LTO processing, and presented to libgomp when registering
offload images through GOMP_offload_register_ver() (called by the mkoffload 
generated
constructor linked into the program binary)

This means that the only reliable place to access omp_requires_mask is in
GOMP_offload_register_ver, however since it is called through an ELF constructor
in the *main program*, this runs later than libgfortran/runtime/main.c:init() 
constructor,
and because some libgfortran init actions there start allocating memory, this 
can cause
more deallocation errors later.

Another issue is that CUDA appears to be registering some cleanup actions using 
atexit(),
which forces libgomp to register gomp_target_fini() using atexit as well (to 
properly run
before the underlying CUDA stuff disappears). This happens to us here as well.

So to summarize we need to: (1) order libgfortran init actions after 
omp_requires_mask
processing is done, and (2) order libgfortran cleanup actions before 
gomp_target_fini,
to properly deallocate stuff without crashing.

The above explanation is for why there's a little new set of definitions, as 
well as
callback registering functions exported from libgomp to libgfortran, basically 
to register
libgfortran init/fini actions into libgomp to run.

Inside GOMP_offload_register_ver, after omp_requires_mask processing is done, 
we call into
libgfortran through a new _gfortran_mem_allocators_init function to insert the 
omp_free/alloc/etc.
based allocators into the Fortran runtime, when 
GOMP_REQUIRES_UNIFIED_SHARED_MEMORY is set.

All symbol references between libgfortran/libgomp are defined with weak 
symbols. Test of the
weak symbols are also used to determine if the other library exists in this 
program.

A final issue is: the case where we have an OpenMP program that does NOT have 
offloading.
We cannot passively determine in libgomp/libgfortran whether offloading exists 
or not, only the
main program itself can, by seeing if the hidden __OFFLOAD_TABLE__ exists.

When we do init/fini libgomp callback registering for OpenMP programs, those 
with no offloading
will not have those callback properly run (because of no offload image loading)
Therefore the solution here is a constructor added into the crtoffloadend.o 
fragment that does
a "null" call of GOMP_offload_register_ver, solely for triggering the 
post-offload_register callbacks
when __OFFLOAD_TABLE__ is NULL. (and because of this, the crtoffloadend.o 
Makefile rule is adjusted
to compile with PIC)

I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran 
interacts, but it's
finally working. Again tested without regressions. Preparing to commit to 
devel/omp/gcc-12, and seeking
approval for mainline when the requires patches are in.

Thanks,
Chung-Lin

2022-08-15  Chung-Lin Tang  

libgcc/
* Makefile.in (crtoffloadend$(objext)): Add $(PICFLAG) to compile rule.
* offloadstuff.c (GOMP_offload_register_ver): Add declaration of weak
symbol.
(__OFFLOAD_TABLE__): Likewise.
(init_non_offload): New function.

libgfortran/

* gfortran.map (GFORTRAN_13): New namespace.
(_gfortran_mem_allocators_init): New name inside GFORTRAN_13.
* libgfortran.h (mem_allocators_init): New exported declaration.
* runtime/main.c (do_init): Rename from init, add run-once guard code.
(cleanup): Add run-once guard code.
(GOMP_post_offload_register_callback): Declare weak symbol.
(GOMP_pre_gomp_target_fini_callback): Likewise.
(init): New constructor to register offload callbacks, or call do_init
when not OpenMP.
* runtime/memory.c (gfortran_malloc): New pointer variable.
(gfortran_calloc): Likewise.
(gfortran_realloc): Likewise.
(gfortran_free): Likewise.
(mem_allocators_init): New function.
(xmalloc): Use gfortran_malloc.
(xmallocarray): Use gfortran_malloc.
(xcalloc): Use gfortran_calloc.
(xrealloc): Use gfortran_realloc.
(xfree): Use gfortran_free.

libgomp/

* libgomp.map (GOMP_5.1.2): New version namespace.
(GOMP_post_offload_register_callback): New name inside GOMP_5.1.2.
(GOMP_pre_gomp_target_fini_callback): Likewise.
(GOMP_DEFINE_CALLBACK_SET): Macro to define callback set.
(post_offload_register): Define callback set for after offload image
register.
(pre_gomp_target_fini): Define callback set for before gomp_target_fini
is called.
(libgfortran_malloc_usm): New function.
(libgfortran_calloc_usm): Likewise
(libgfortran_realloc_usm): Likewise
(libgfortran_free_usm): Likewise.
(_gfortran_mem_alloc

[PATCH, OpenMP, Fortran] requires unified_shared_memory 1/2: adjust libgfortran memory allocators

2022-08-15 Thread Chung-Lin Tang

Hi, this patch is to fix the case where 'requires unified_shared_memory' doesn't
work due to memory allocator mismatch. Currently this is only for OG12 
(devel/omp/gcc-12),
but will apply to mainline as well once those requires patches get in.

Basically, under 'requires unified_shared_memory' enables the usm_transform 
pass,
which transforms some of the expanded Fortran intrinsic code that uses 
__builtin_free()
into 'omp_free (..., ompx_unified_shared_mem_alloc)'.

The intention is to make all dynamic memory allocation use the OpenMP 
unified_shared_memory
allocator, but there is a big gap in this, namely libgfortran. What happens in 
some tests
are that libgfortran allocates stuff using normal malloc(), and the 
usm_transform generates
code that frees the stuff using omp_free(), and chaos ensues.

So the proper fix we believe is: to make it possible to move the entire 
libgfortran on to
unified_shared_memory.

This first patch is a mostly mechanical patch to change all references of 
malloc/free/calloc/realloc
in libgfortran into xmalloc/xfree/xcalloc/xrealloc in 
libgfortran/runtime/memory.c,
as well as strdup uses into a new internal xstrdup.

All of libgfortran is adjusted this way, except libgfortran/caf, which is an 
independent library
outside of libgfortran.so.

The second patch of this series will present a way to switch the references of 
allocators
in libgfortran/runtime/memory.c from the normal glibc malloc/free/etc. to 
omp_alloc/omp_free/etc.
when 'requires unified_shared_memory' is detected.

Tested on devel/omp/gcc-12. Plans is to commit there soon, but also seeking 
approval for mainline
once the requires stuff goes in.

Thanks,
Chung-Lin

2022-08-15  Chung-Lin Tang  

libgfortran/ChangeLog:

* m4/matmul_internal.m4: Adjust malloc/free to xmalloc/xfree.
* generated/matmul_c10.c: Regenerate.
* generated/matmul_c16.c: Likewise.
* generated/matmul_c17.c: Likewise.
* generated/matmul_c4.c: Likewise.
* generated/matmul_c8.c: Likewise.
* generated/matmul_i1.c: Likewise.
* generated/matmul_i16.c: Likewise.
* generated/matmul_i2.c: Likewise.
* generated/matmul_i4.c: Likewise.
* generated/matmul_i8.c: Likewise.
* generated/matmul_r10.c: Likewise.
* generated/matmul_r16.c: Likewise.
* generated/matmul_r17.c: Likewise.
* generated/matmul_r4.c: Likewise.
* generated/matmul_r8.c: Likewise.
* generated/matmulavx128_c10.c: Likewise.
* generated/matmulavx128_c16.c: Likewise.
* generated/matmulavx128_c17.c: Likewise.
* generated/matmulavx128_c4.c: Likewise.
* generated/matmulavx128_c8.c: Likewise.
* generated/matmulavx128_i1.c: Likewise.
* generated/matmulavx128_i16.c: Likewise.
* generated/matmulavx128_i2.c: Likewise.
* generated/matmulavx128_i4.c: Likewise.
* generated/matmulavx128_i8.c: Likewise.
* generated/matmulavx128_r10.c: Likewise.
* generated/matmulavx128_r16.c: Likewise.
* generated/matmulavx128_r17.c: Likewise.
* generated/matmulavx128_r4.c: Likewise.
* generated/matmulavx128_r8.c: Likewise.
* intrinsics/access.c (access_func): Adjust free to xfree.
* intrinsics/chdir.c (chdir_i4_sub): Likewise.
(chdir_i8_sub): Likewise.
* intrinsics/chmod.c (chmod_func): Likewise.
* intrinsics/date_and_time.c (secnds): Likewise.
* intrinsics/env.c (PREFIX(getenv)): Likewise.
(get_environment_variable_i4): Likewise.
* intrinsics/execute_command_line.c (execute_command_line): Likewise.
* intrinsics/getcwd.c (getcwd_i4_sub): Likewise.
* intrinsics/getlog.c (PREFIX(getlog)): Likewise.
* intrinsics/link.c (link_internal): Likewise.
* intrinsics/move_alloc.c (move_alloc): Likewise.
* intrinsics/perror.c (perror_sub): Likewise.
* intrinsics/random.c (constructor_random): Likewise.
* intrinsics/rename.c (rename_internal): Likewise.
* intrinsics/stat.c (stat_i4_sub_0): Likewise.
(stat_i8_sub_0): Likewise.
* intrinsics/symlnk.c (symlnk_internal): Likewise.
* intrinsics/system.c (system_sub): Likewise.
* intrinsics/unlink.c (unlink_i4_sub): Likewise.
* io/async.c (update_pdt): Likewise.
(async_io): Likewise.
(free_async_unit): Likewise.
(init_async_unit): Adjust calloc to xcalloc.
(enqueue_done_id): Likewise.
(enqueue_done): Likewise.
(enqueue_close): Likewise.
* io/async.h (MUTEX_DEBUG_ADD): Adjust malloc/free to xmalloc/xfree.
* io/close.c (st_close): Adjust strdup/free to xstrdup/xfree.
* io/fbuf.c (fbuf_destroy): Adjust free to xfree.
* io/format.c (free_format_hash_table): Likewise.
(save_parsed_format): Likewise.
(free_format): Likewise.
(free_format_data): Likewise.
* io/intrinsics.c (ttynam

Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule

2022-08-04 Thread Chung-Lin Tang

On 2022/6/28 10:06 PM, Jakub Jelinek wrote:

On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote:

with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next:

(1) chunk_size <= -1: wraps into large unsigned value, seems to work though.
(2) chunk_size == 0:  infinite loop

The (2) behavior is obviously not desired. This patch fixes this by changing


Why?  It is a user error, undefined behavior, we shouldn't slow down valid
code for users who don't bother reading the standard.


This is loop init code, not per-iteration. The overhead really isn't that much.

The question should be, if GCC having infinite loop behavior is reasonable,
even if it is undefined in the spec.


E.g. OpenMP 5.1 [132:14] says clearly:
"chunk_size must be a loop invariant integer expression with a positive
value."
and omp_set_schedule for chunk_size < 1 should use a default value (which it
does).

For OMP_SCHEDULE the standard says it is implementation-defined what happens
if the format isn't the specified one, so I guess the env.c change
could be acceptable (though without it it is fine too), but the
loop.c change is wrong.  Note, if the loop.c change would be ok, you'd
need to also change loop_ull.c too.


I've updated the patch to add the same changes for libgomp/loop_ull.c and 
updated
the testcase too. Tested on mainline trunk without regressions.

Thanks,
Chung-Lin

libgomp/ChangeLog:

* env.c (parse_schedule): Make negative values invalid for chunk_size.
* loop.c (gomp_loop_init): For non-STATIC schedule and chunk_size <= 0,
set initialized chunk_size to 1.
* loop_ull.c (gomp_loop_ull_init): Likewise.

* testsuite/libgomp.c/loop-28.c: New test.diff --git a/libgomp/env.c b/libgomp/env.c
index 1c4ee894515..dff07617e15 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -182,6 +182,8 @@ parse_schedule (void)
 goto invalid;
 
   errno = 0;
+  if (*env == '-')
+goto invalid;
   value = strtoul (env, , 10);
   if (errno || end == env)
 goto invalid;
diff --git a/libgomp/loop.c b/libgomp/loop.c
index be85162bb1e..018b4e9a8bd 100644
--- a/libgomp/loop.c
+++ b/libgomp/loop.c
@@ -41,7 +41,7 @@ gomp_loop_init (struct gomp_work_share *ws, long start, long 
end, long incr,
enum gomp_schedule_type sched, long chunk_size)
 {
   ws->sched = sched;
-  ws->chunk_size = chunk_size;
+  ws->chunk_size = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 1;
   /* Canonicalize loops that have zero iterations to ->next == ->end.  */
   ws->end = ((incr > 0 && start > end) || (incr < 0 && start < end))
? start : end;
diff --git a/libgomp/loop_ull.c b/libgomp/loop_ull.c
index 602737296d4..74ddb1bd623 100644
--- a/libgomp/loop_ull.c
+++ b/libgomp/loop_ull.c
@@ -43,7 +43,7 @@ gomp_loop_ull_init (struct gomp_work_share *ws, bool up, 
gomp_ull start,
gomp_ull chunk_size)
 {
   ws->sched = sched;
-  ws->chunk_size_ull = chunk_size;
+  ws->chunk_size_ull = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 
1;
   /* Canonicalize loops that have zero iterations to ->next == ->end.  */
   ws->end_ull = ((up && start > end) || (!up && start < end))
? start : end;
diff --git a/libgomp/testsuite/libgomp.c/loop-28.c 
b/libgomp/testsuite/libgomp.c/loop-28.c
new file mode 100644
index 000..664842e27aa
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/loop-28.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-timeout 10 } */
+
+void __attribute__((noinline))
+foo (int a[], int n, int chunk_size)
+{
+  #pragma omp parallel for schedule (dynamic,chunk_size)
+  for (int i = 0; i < n; i++)
+a[i] = i;
+
+  #pragma omp parallel for schedule (dynamic,chunk_size)
+  for (unsigned long long i = 0; i < n; i++)
+a[i] = i;
+}
+
+int main (void)
+{
+  int a[100];
+  foo (a, 100, 0);
+  return 0;
+}


[PATCH, libgomp] Fix chunk_size<1 for dynamic schedule

2022-06-23 Thread Chung-Lin Tang

Hi Jakub,
with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next:

(1) chunk_size <= -1: wraps into large unsigned value, seems to work though.
(2) chunk_size == 0:  infinite loop

The (2) behavior is obviously not desired. This patch fixes this by changing
the chunk_size initialization in gomp_loop_init to "max(1,chunk_size)"

The OMP_SCHEDULE parsing in libgomp/env.c has also been adjusted to reject
negative values.

Tested without regressions, and a new testcase for the infinite loop behavior 
added.
Okay for trunk?

Thanks,
Chung-Lin

libgomp/ChangeLog:
* env.c (parse_schedule): Make negative values invalid for chunk_size.
* loop.c (gomp_loop_init): For non-STATIC schedule and chunk_size <= 0,
set initialized chunk_size to 1.

* testsuite/libgomp.c/loop-28.c: New test.diff --git a/libgomp/env.c b/libgomp/env.c
index 1c4ee894515..dff07617e15 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -182,6 +182,8 @@ parse_schedule (void)
 goto invalid;
 
   errno = 0;
+  if (*env == '-')
+goto invalid;
   value = strtoul (env, , 10);
   if (errno || end == env)
 goto invalid;
diff --git a/libgomp/loop.c b/libgomp/loop.c
index be85162bb1e..018b4e9a8bd 100644
--- a/libgomp/loop.c
+++ b/libgomp/loop.c
@@ -41,7 +41,7 @@ gomp_loop_init (struct gomp_work_share *ws, long start, long 
end, long incr,
enum gomp_schedule_type sched, long chunk_size)
 {
   ws->sched = sched;
-  ws->chunk_size = chunk_size;
+  ws->chunk_size = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 1;
   /* Canonicalize loops that have zero iterations to ->next == ->end.  */
   ws->end = ((incr > 0 && start > end) || (incr < 0 && start < end))
? start : end;
diff --git a/libgomp/testsuite/libgomp.c/loop-28.c 
b/libgomp/testsuite/libgomp.c/loop-28.c
new file mode 100644
index 000..e3f852046f4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/loop-28.c
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+/* { dg-timeout 10 } */
+
+void __attribute__((noinline))
+foo (int a[], int n, int chunk_size)
+{
+  #pragma omp parallel for schedule (dynamic,chunk_size)
+  for (int i = 0; i < n; i++)
+a[i] = i;
+}
+
+int main (void)
+{
+  int a[100];
+  foo (a, 100, 0);
+  return 0;
+}


Re: [PATCH, OpenMP, v4] Implement uses_allocators clause for target regions

2022-06-13 Thread Chung-Lin Tang

On 2022/6/9 8:22 PM, Jakub Jelinek wrote:

+   OpenMP 5.2:
+
+   uses_allocators ( modifier : allocator-list )

Please drop the -list above.


+   uses_allocators ( modifier , modifier : allocator-list )

and here too.


Thanks for catching.


+  struct item_tok
+  {
+location_t loc;
+tree id;
+item_tok (void) : loc (UNKNOWN_LOCATION), id (NULL_TREE) {}
+  };
+  struct item { item_tok name, arg; };
+  auto_vec *modifiers = NULL, *allocators = NULL;
+  auto_vec *cur_list = new auto_vec (4);

I was hoping you'd drop all this.
Seehttps://gcc.gnu.org/r13-1002
for implementation (both C and C++ FE) of something very similar,
the only difference there is that in the case of linear clause, it is
looking for
val
ref
uval
step ( whatever )
followed by , or )
(anod ref and uval not in C FE),
while you are looking for
memspace ( whatever )
traits ( whatever )
followed by : or by , (in case of , repeat).
But in both cases you can actually use the same parser APIs
for raw token pre-parsing to just compute if it is the modifier
syntax or not, set bool has_modifiers based on that (when you
come over probably valid syntax followed by CPP_COLON).


The linear clause doesn't have the legacy 'allocator1(t1), allocator2(t2), ...' 
requirement,
and c_parser_omp_variable_list doesn't seem to support this pattern.

Also, the way c_parser_omp_clause_linear is implemented doesn't support the 
requirement
you mentioned earlier of allowing the use of "memspace", "traits" as the 
allocator name when
it's actually not a modifier.

I have merged the v4 patch with the syntax comments updated as above to 
devel/omp/gcc-11.

Thanks,
Chung-Lin



[PATCH, OpenMP, v4] Implement uses_allocators clause for target regions

2022-06-09 Thread Chung-Lin Tang

Hi Jakub,
this is v4 of the uses_allocators patch.

On 2022/5/31 6:02 PM, Jakub Jelinek wrote:

The response I got on omp-lang is that it is intentional that in the new
syntax only a single allocator is allowed.
So I'd suggest to implement:
1) if has_modifiers (i.e. certainly new syntax), only allow a single
enumerator / identifier for a variable and no ()s after it
2) if !has_modifiers and there is exactly one allocator without ()s,
treat it like new syntax
3) otherwise, it is the old (5.1) syntax, which allows a list and that
list can contain ()s for traits, but in the light of the 5.2 wording,
I'd even for that case avoid diagnosing missing traits for non-predefined
allocators
4) omp_null_allocator should be diagnosed as invalid,
private (omp_null_allocator) is rejected...


I've adjusted the checking to enforce these rules, and updated the testcases.
Re-tested without regressions.


5) for C++, we should handle FIELD_DECLs, but it shouldn't be hard, just
look how it is handled for private too


As discussed in the other mail, private() for FIELD_DECLs on target constructs
seem not working properly, filed PR105861 for this.

Currently uses_allocators (which also uses private) is still sorry() for 
FIELD_DECLs
in this v4 patch. Will file another issue to track after patch is committed.

(ChangeLog should be the same as before, so omitted here)

Thanks,
Chung-Lindiff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 3a7cecdf087..be3e6ff697e 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -283,6 +283,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT32_DFLOAT32, BT_DFLOAT32, 
BT_DFLOAT32)
 DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT64_DFLOAT64, BT_DFLOAT64, BT_DFLOAT64)
 DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRMODE, BT_VOID, BT_PTRMODE)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_CONST_PTR, BT_VOID, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
@@ -641,6 +642,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_PTR_SIZE_SIZE_PTRMODE,
 BT_PTR, BT_SIZE, BT_SIZE, BT_PTRMODE)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_UINT8_PTRMODE, BT_VOID, BT_PTR, BT_UINT8,
 BT_PTRMODE)
+DEF_FUNCTION_TYPE_3 (BT_FN_PTRMODE_PTRMODE_INT_PTR, BT_PTRMODE, BT_PTRMODE,
+BT_INT, BT_PTR)
 
 DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
 BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 66d17a2673d..50db6936728 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -1873,6 +1873,7 @@ c_omp_split_clauses (location_t loc, enum tree_code code,
case OMP_CLAUSE_HAS_DEVICE_ADDR:
case OMP_CLAUSE_DEFAULTMAP:
case OMP_CLAUSE_DEPEND:
+   case OMP_CLAUSE_USES_ALLOCATORS:
  s = C_OMP_CLAUSE_SPLIT_TARGET;
  break;
case OMP_CLAUSE_NUM_TEAMS:
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 54864c2ec41..7f8944f81d6 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -154,6 +154,7 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_UNTIED,
   PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR,
   PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR,
+  PRAGMA_OMP_CLAUSE_USES_ALLOCATORS,
 
   /* Clauses for OpenACC.  */
   PRAGMA_OACC_CLAUSE_ASYNC,
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 492d995a281..0fe5b7ac2e4 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -12922,6 +12922,8 @@ c_parser_omp_clause_name (c_parser *parser)
result = PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR;
  else if (!strcmp ("use_device_ptr", p))
result = PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR;
+ else if (!strcmp ("uses_allocators", p))
+   result = PRAGMA_OMP_CLAUSE_USES_ALLOCATORS;
  break;
case 'v':
  if (!strcmp ("vector", p))
@@ -15651,6 +15653,213 @@ c_parser_omp_clause_allocate (c_parser *parser, tree 
list)
   return nl;
 }
 
+/* OpenMP 5.0:
+   uses_allocators ( allocator-list )
+
+   allocator-list:
+   allocator
+   allocator , allocator-list
+   allocator ( traits-array )
+   allocator ( traits-array ) , allocator-list
+
+   OpenMP 5.2:
+
+   uses_allocators ( modifier : allocator-list )
+   uses_allocators ( modifier , modifier : allocator-list )
+
+   modifier:
+   traits ( traits-array )
+   memspace ( mem-space-handle )  */
+
+static tree
+c_parser_omp_clause_uses_allocators (c_parser *parser, tree list)
+{
+  location_t clause_loc = c_parser_peek_token (parser)->location;
+  tree t = NULL_TREE, nl = list;
+  matching_parens parens;
+  if (!parens.require_open (parser))
+return list;
+
+  tree memspace_expr = NULL_TREE;
+  tree traits_var = NULL_TREE;
+
+  struct item_tok
+  {
+location_t loc;
+tree id;
+item_tok (void) : loc 

Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions

2022-06-06 Thread Chung-Lin Tang




On 2022/6/6 9:22 下午, Jakub Jelinek wrote:

On Mon, Jun 06, 2022 at 09:19:18PM +0800, Chung-Lin Tang wrote:

On 2022/5/31 6:02 PM, Jakub Jelinek wrote:

5) for C++, we should handle FIELD_DECLs, but it shouldn't be hard, just
 look how it is handled for private too

Jakub


About private() for non-static members, is it really working right now?


Perhaps we have a bug that we should file in bugzilla and should fix.

Can you try omp parallel or omp target in the test instead?


I see it works for omp parallel/task, gimplify results:

void C::foo (struct C * const this)
{
  omp_allocator_handle_t a [value-expr: ((struct C *) this)->a];

  #pragma omp parallel private(a)
{
  a = 0;
}
}

I'll file a bugzilla for the target construct.

That said, can we delay FIELD_DECL support for uses_allocators? (which is 
target construct only)
Since it appears to be not trivial at the moment.

Thanks,
Chung-Lin



A simple test:

struct C {
   omp_allocator_handle_t a;
   void foo (void) {
 #pragma omp target private (a)
  a = (omp_allocator_handle_t) 0;
   }
};

int main (void)
{
   C c;
   c.foo ();
   return 0;
}


Jakub



Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions

2022-06-06 Thread Chung-Lin Tang

On 2022/5/31 6:02 PM, Jakub Jelinek wrote:

5) for C++, we should handle FIELD_DECLs, but it shouldn't be hard, just
look how it is handled for private too

Jakub


About private() for non-static members, is it really working right now?
A simple test:

struct C {
  omp_allocator_handle_t a;
  void foo (void) {
#pragma omp target private (a)
 a = (omp_allocator_handle_t) 0;
  }
};

int main (void)
{
  C c;
  c.foo ();
  return 0;
}

After C++ front-end processing we get:

{
omp_allocator_handle_t D.2823 [value-expr: ((struct C *) this)->a];
  #pragma omp target private(D.2823)
{
  {
<;
  }
}
}

The OMP field privatization seems to be doing something here.
However gimplify turns this into:

void C::foo (struct C * const this)
{
  omp_allocator_handle_t a [value-expr: ((struct C *) this)->a];

  #pragma omp target num_teams(1) thread_limit(0) private(a) \
  map(alloc:MEM[(char *)this] [len: 0]) map(firstprivate:this [pointer 
assign, bias: 0])
{
  this->a = 0;
}
}

This doesn't look quite right for private clause. I don't quite expect a 
zero-length mapping of this[:0],
nor reverting the gimple to use "this->a" for a private copy.

Chung-Lin


Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions

2022-05-30 Thread Chung-Lin Tang

Hi Jakub,
this is v3 of the uses_allocators patch.

On 2022/5/20 1:46 AM, Jakub Jelinek wrote:

On Tue, May 10, 2022 at 07:29:23PM +0800, Chung-Lin Tang wrote:

@@ -15624,6 +15626,233 @@ c_parser_omp_clause_allocate (c_parser *parser, tree 
list)
return nl;
  }
  
+/* OpenMP 5.2:

+   uses_allocators ( allocator-list )


As uses_allocators is a 5.0 feature already, the above should say
/* OpenMP 5.0:

+
+   allocator-list:
+   allocator
+   allocator , allocator-list
+   allocator ( traits-array )
+   allocator ( traits-array ) , allocator-list
+


And here it should add
   OpenMP 5.2:


Done.


+  if (c_parser_next_token_is (parser, CPP_NAME))
+{
+  c_token *tok = c_parser_peek_token (parser);
+  const char *p = IDENTIFIER_POINTER (tok->value);
+
+  if (strcmp ("traits", p) == 0 || strcmp ("memspace", p) == 0)
+   {
+ has_modifiers = true;
+ c_parser_consume_token (parser);
+ matching_parens parens2;;


Double ;;, should be just ;
But more importantly, it is more complex.
When you see
uses_allocators(traits or
uses_allocators(memspace
it is not given that it has modifiers.  While the 5.0/5.1 syntax had
a restriction that when allocator is not a predefined allocator (and
traits or memspace aren't predefined allocators) it must use ()s with
traits, so
uses_allocators(traits)
uses_allocators(memspace)
uses_allocators(traits,memspace)
are all invalid,
omp_allocator_handle_t traits;
uses_allocators(traits(mytraits))
or
omp_allocator_handle_t memspace;
uses_allocators(memspace(mytraits),omp_default_mem_alloc)
are valid in the old syntax.

So, I'm afraid to find out if the traits or memspace identifier
seen after uses_allocator ( are modifiers or not we need to
peek (for C with c_parser_peek_nth_token_raw) through all the
modifiers whether we see a : and only in that case say they
are modifiers rather than the old style syntax.


The parser parts have been rewritten to allow this kind of use now.
New code essentially parses lists of "id(id), id(id), ...", possibly delimited
by a ':' marking the modifier/allocator lists.


I don't really like the modifiers handling not done in a loop.
As I said above, there needs to be some check whether there are modifiers or
not, but once we figure out there are modifiers, it should be done in a loop
with say some mask var on which traits have been already handled to diagnose
duplicates, we don't want to do the parsing code twice.


Now everything is done in loops. The new code should be considerably simpler 
now.


This feels like you only accept a single allocator in the new syntax,
but that isn't my reading of the spec, I'd understand it as:
uses_allocators (memspace(omp_high_bw_mem_space), traits(foo_traits) : bar, 
baz, qux)
being valid too.


This patch now allows multiple allocators to be specified in new syntax, 
although I have
to note that the 5.2 specification of uses_allocators (page 181) specifically 
says
"allocator: expression of allocator_handle_type" for the "Arguments" 
description,
not a "list" like the allocate clause.


+   case OMP_CLAUSE_USES_ALLOCATORS:
+ t = OMP_CLAUSE_USES_ALLOCATORS_ALLOCATOR (c);
+ if (bitmap_bit_p (_head, DECL_UID (t))
+ || bitmap_bit_p (_head, DECL_UID (t))
+ || bitmap_bit_p (_head, DECL_UID (t))
+ || bitmap_bit_p (_head, DECL_UID (t)))


You can't just use DECL_UID before you actually verify it is a variable.
So IMHO this particular if should be moved down somewhat.


Guarded now.


+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "%qE appears more than once in data clauses", t);
+ remove = true;
+   }
+ else
+   bitmap_set_bit (_head, DECL_UID (t));
+ if (TREE_CODE (TREE_TYPE (t)) != ENUMERAL_TYPE
+ || strcmp (IDENTIFIER_POINTER (TYPE_IDENTIFIER (TREE_TYPE (t))),
+"omp_allocator_handle_t") != 0)
+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "allocator must be of % type");
+ remove = true;
+   }


I'd add break; after remove = true;


Added some such breaks.


+ if (TREE_CODE (t) == CONST_DECL)
+   {
+ if (OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE (c)
+ || OMP_CLAUSE_USES_ALLOCATORS_TRAITS (c))
+   error_at (OMP_CLAUSE_LOCATION (c),
+ "modifiers cannot be used with pre-defined "
+ "allocators");
+
+ /* Currently for pre-defined allocators in libgomp, we do not
+require additional init/fini inside target regions, so discard
+such clauses.  */
+ remove = true;
+   }


It should be only removed if we emit the error (again with break; too).
IMHO (see the other mail) we should 

[PATCH, OpenMP, v2] Implement uses_allocators clause for target regions

2022-05-10 Thread Chung-Lin Tang

On 2022/5/7 12:40 AM, Tobias Burnus wrote:


Can please also handle the new clause in Fortran's dump-parse-tree.cc?

I did see some split handling in C, but not in Fortran; do you also need
to up update gfc_split_omp_clauses in Fortran's trans-openmp.cc?


Done.


Actually, glancing at the testcases, no combined construct (like
"omp target parallel") is used, I think that would be useful because of ↑.


Okay, added some to testcases.


+/* OpenMP 5.2:
+   uses_allocators ( allocator-list )

That's not completely true: uses_allocators is OpenMP 5.1.
However, 5.1 only supports (for non-predefined allocators):
    uses_allocators( allocator(traits) )
while OpenMP 5.2 added modifiers:
    uses_allocatrors( traits(...), memspace(...) : allocator )
and deprecated the 5.1 'allocator(traits)'. (Scheduled for removal in OMP 6.0)

The advantage of 5.2 syntax is that a memory space can be defined.


I supported both syntaxes, that's why I designated it as "5.2".


BTW: This makes uses_allocators the first OpenMP 5.2 feature which
will make it into GCC :-)


:)



gcc/fortran/openmp.cc:

+  if (gfc_get_symbol ("omp_allocator_handle_kind", NULL, )
+  || !sym->value
+  || sym->value->expr_type != EXPR_CONSTANT
+  || sym->value->ts.type != BT_INTEGER)
+    {
+  gfc_error ("OpenMP % constant not found by "
+ "% clause at %C");
+  goto error;
+    }
+  allocator_handle_kind = sym;

I think you rather want to use
   gfc_find_symbol ("omp_...", NULL, true, )
   || sym == NULL
where true is for parent_flag to search also the parent namespace.
(The function returns 1 if the symbol is ambiguous, 0 otherwise -
including 0 + sym == NULL when the symbol could not be found.)

   || sym->attr.flavor != FL_PARAMETER
   || sym->ts.type != BT_INTEGER
   || sym->attr.dimension

Looks cleaner than to access sym->value. The attr.dimension is just
to makes sure the user did not smuggle an array into this.
(Invalid as omp_... is a reserved namespace but users will still do
this and some are good in finding ICE as hobby.)


Well, the intention here is to search for "omp_allocator_handle_kind" and 
"omp_memspace_handle_kind",
and use their value to check if the kinds are the same as declared allocator 
handles and memspace constant.
Not to generally search for "omp_...".

However the sym->attr.dimension test seems useful, added in new v2 patch.


However, I fear that will fail for the following two examples (both untested):

   use omp_lib, my_kind = omp_allocator_handle_kind
   integer(my_kind) :: my_allocator

as this gives 'my_kind' in the symtree->name (while symtree->n.sym->name is 
"omp_...").
Hence, by searching the symtree for 'omp_...' the symbol will not be found.


It will likely also fail for the following more realistic example:

...

subroutine foo
   use m
   use omp_lib, only: omp_alloctrait

...

   !$omp target uses_allocators(my_allocator(traits_array) 
allocate(my_allocator:A) firstprivate(A)
  ...
   !$omp end target
end


If someone wants to use OpenMP allocators, but intentionally only imports 
insufficient standard symbols from omp_lib,
then he/she is on their own :)

The specification really makes this quite clear: omp_allocator_handle_kind, 
omp_alloctrait, omp_memspace_handle_kind are
all part of the same package.


In this case, omp_allocator_handle_kind is not in the namespace of 'foo'
but the code should be still valid. Thus, an alternative would be to hard-code
the value - as done for the depobj. As we have:

     integer, parameter :: omp_allocator_handle_kind = c_intptr_t
     integer, parameter :: omp_memspace_handle_kind = c_intptr_t

that would be
    sym->ts.type == BT_CHARACTER
    sym->ts.kind == gfc_index_integer_kind
for the allocator variable and the the memspace kind.

However, I grant that either example is not very typical. The second one is more
natural – such a code will very likely be written in the real world. But not
with uses_allocators but rather with "!$omp requires dynamic_allocators" and
omp_init_allocator().

Thoughts?


As above. I mean, what is so hard with including "use omp_lib" where you need 
it? :D


* * *

gcc/fortran/openmp.cc

+  if (++i > 2)
+    {
+  gfc_error ("Only two modifiers are allowed on % "
+ "clause at %C");
+  goto error;
+    }
+


Is this really needed? There is a check for multiple traits and multiple 
memspace
Thus, 'trait(),memspace(),trait()' is already handled and
'trait(),something' give a break and will lead to an error as in that case
a ':' and not ',something' is expected.


I think it could be worth reminding that limitation, instead of a generic error.


+  if (gfc_match_char ('(') == MATCH_YES)
+    {
+  if (memspace_seen || traits_seen)
+    {
+  gfc_error ("Modifiers cannot be used with legacy "
+ "array syntax at %C");

I wouldn't uses the term 'array synax' to denote
   uses_allocators(allocator (alloc_array) )
How about:
   error: "Using both 

[PATCH, OpenMP] Implement uses_allocators clause for target regions

2022-05-06 Thread Chung-Lin Tang

Hi Jakub,
this patch implements the uses_allocators clause for OpenMP target regions.

For user defined allocator handles, this allows target regions to assign
memory space and traits to allocators, and automatically calls
omp_init/destroy_allocator() in the beginning/end of the target region.

For pre-defined allocators (i.e. omp_..._mem_alloc names), this is a no-op,
such clauses are not created.

Asides from the front-end portions, the target region transforms are
done in gimplify_omp_workshare.

This patch also includes added changes to enforce the "allocate allocator
must be also in a uses_allocator clause", as to mentioned in[1].
This is done during gimplify_scan_omp_clauses.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594039.html

Tested on mainline, please see if this is okay.

Thanks,
Chung-Lin

2022-05-06  Chung-Lin Tang  

gcc/c-family/ChangeLog:

* c-omp.cc (c_omp_split_clauses): Add OMP_CLAUSE_USES_ALLOCATORS case.
* c-pragma.h (enum pragma_omp_clause): Add 
PRAGMA_OMP_CLAUSE_USES_ALLOCATORS.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_name): Add case for uses_allocators
clause.
(c_parser_omp_clause_uses_allocators): New function.
(c_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case.
(OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask.
* c-typeck.cc (c_finish_omp_clauses): Add case handling for
OMP_CLAUSE_USES_ALLOCATORS.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_name): Add case for uses_allocators
clause.
(cp_parser_omp_clause_uses_allocators): New function.
(cp_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case.
(OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask.
* semantics.cc (finish_omp_clauses): Add case handling for
OMP_CLAUSE_USES_ALLOCATORS.

fortran/ChangeLog:

* gfortran.h (struct gfc_omp_namelist): Add memspace_sym, traits_sym
fields.
(OMP_LIST_USES_ALLOCATORS): New list enum.
* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_USES_ALLOCATORS.
(gfc_match_omp_clause_uses_allocators): New function.
(gfc_match_omp_clauses): Add case to handle OMP_CLAUSE_USES_ALLOCATORS.
(OMP_TARGET_CLAUSES): Add OMP_CLAUSE_USES_ALLOCATORS.
(resolve_omp_clauses): Add "USES_ALLOCATORS" to clause_names[].
* trans-array.cc (gfc_conv_array_initializer): Adjust array index
to always be a created tree expression instead of NULL_TREE when zero.
* trans-openmp.cc (gfc_trans_omp_clauses): For ALLOCATE clause, handle
using gfc_trans_omp_variable for EXPR_VARIABLE exprs.
Add handling of OMP_LIST_USES_ALLOCATORS case.
* types.def (BT_FN_VOID_PTRMODE): Define.
(BT_FN_PTRMODE_PTRMODE_INT_PTR): Define.

gcc/ChangeLog:

* builtin-types.def (BT_FN_VOID_PTRMODE): Define.
(BT_FN_PTRMODE_PTRMODE_INT_PTR): Define.
* omp-builtins.def (BUILT_IN_OMP_INIT_ALLOCATOR): Define.
(BUILT_IN_OMP_DESTROY_ALLOCATOR): Define.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_USES_ALLOCATORS.
* tree-pretty-print.cc (dump_omp_clause): Handle 
OMP_CLAUSE_USES_ALLOCATORS.
* tree.h (OMP_CLAUSE_USES_ALLOCATORS_ALLOCATOR): New macro.
(OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE): New macro.
(OMP_CLAUSE_USES_ALLOCATORS_TRAITS): New macro.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_USES_ALLOCATORS.
(omp_clause_code_name): Add "uses_allocators".

* gimplify.cc (gimplify_scan_omp_clauses): Add checking of OpenMP target
region allocate clauses, to require a uses_allocators clause to exist
for allocators.
(gimplify_omp_workshare): Add handling of OMP_CLAUSE_USES_ALLOCATORS
for OpenMP target regions; create calls of omp_init/destroy_allocator
around target region body.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/uses_allocators-1.c: New test.
* c-c++-common/gomp/uses_allocators-2.c: New test.
* gfortran.dg/gomp/uses_allocators-1.f90: New test.
* gfortran.dg/gomp/uses_allocators-2.f90: New test.
* gfortran.dg/gomp/uses_allocators-3.f90: New test.
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 3a7cecdf087..be3e6ff697e 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -283,6 +283,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT32_DFLOAT32, BT_DFLOAT32, 
BT_DFLOAT32)
 DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT64_DFLOAT64, BT_DFLOAT64, BT_DFLOAT64)
 DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRMODE, BT_VOID, BT_PTRMODE)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_CONST_PTR, BT_VOID, BT_CONST_PTR)
 DEF_FU

[PATCH, OpenMP] Fix nested use_device_ptr

2022-04-01 Thread Chung-Lin Tang

Hi Jakub,
this patch fixes a bug in lower_omp_target, where for Fortran arrays,
the expanded sender assignment is wrongly using the variable in the
current ctx, instead of the one looked-up outside, which is causing
use_device_ptr/addr to fail to work when used inside an omp-parallel
(where the omp child_fn is split away from the original).
Just a one-character change to fix this.

The fix is inside omp-low.cc, though because the omp_array_data langhook
is used only by Fortran, this is essentially Fortran-specific.

Tested on x86_64-linux + nvptx offloading without regressions.
This is probably not a regression, but seeking to commit when stage1 opens.

Thanks,
Chung-Lin

2022-04-01  Chung-Lin Tang  

gcc/ChangeLog:

* omp-low.cc (lower_omp_target): Use outer context looked-up 'var' as
argument to lang_hooks.decls.omp_array_data, instead of 'ovar' from
current clause.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/use_device_ptr-4.f90: New testcase.

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 392bb18..bf5779b 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -13405,7 +13405,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
 
type = TREE_TYPE (ovar);
if (lang_hooks.decls.omp_array_data (ovar, true))
- var = lang_hooks.decls.omp_array_data (ovar, false);
+ var = lang_hooks.decls.omp_array_data (var, false);
else if (((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_ADDR
  || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
  && !omp_privatize_by_reference (ovar)
diff --git a/libgomp/testsuite/libgomp.fortran/use_device_ptr-4.f90 
b/libgomp/testsuite/libgomp.fortran/use_device_ptr-4.f90
new file mode 100644
index 000..8c361d1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/use_device_ptr-4.f90
@@ -0,0 +1,41 @@
+! { dg-do run }
+!
+! Test user_device_ptr nested within another parallel
+! construct
+!
+program test_nested_use_device_ptr
+  use iso_c_binding, only: c_loc, c_ptr
+  implicit none
+  real, allocatable, target :: arr(:,:)
+  integer :: width = 1024, height = 1024, i
+  type(c_ptr) :: devptr
+
+  allocate(arr(width,height))
+
+  !$omp target enter data map(alloc: arr)
+
+  !$omp target data use_device_ptr(arr)
+  devptr = c_loc(arr(1,1))
+  !$omp end target data
+
+  !$omp parallel default(none) shared(arr, devptr)
+  !$omp single
+
+  !$omp target data use_device_ptr(arr)
+  call thing(c_loc(arr), devptr)
+  !$omp end target data
+
+  !$omp end single
+  !$omp end parallel
+  !$omp target exit data map(delete: arr)
+
+contains
+
+  subroutine thing(myarr, devptr)
+use iso_c_binding, only: c_ptr, c_associated
+implicit none
+type(c_ptr) :: myarr, devptr
+if (.not.c_associated(myarr, devptr)) stop 1
+  end subroutine thing
+
+end program


[RFC][PATCH, OpenMP/OpenACC, libgomp] Allow base-pointers to be NULL

2022-03-09 Thread Chung-Lin Tang

Hi all,
when troubleshooting building/running SPEC HPC 2021 with GCC with OpenMP 
offloading,
specifically 534.hpgmgfv_t, an issue encountered in the benchmark was:
when the benchmark was initializing and creating its data environment on the 
GPU,
it was trying to map array sections where the base-pointer is actually NULL:
...
for (block=0;block<3;++block) {
  #pragma omp target enter data 
map(to:level->restriction[shape].blocks[block][:length])
  // level->restriction[shape].blocks[block] == NULL for some values of index 
'block'
...

The benchmark appears to be assuming that such NULL base-pointers would simply 
be
silently ignored, and the program would just keep running.

(BTW, the above case needs this patch to compile:
 https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590658.html
 which is still awaiting review :) )

What we currently do in libgomp, however, is that we issue an error and call 
gomp_fatal():
libgomp/target.c:gomp_attach_pointer():
...
   if ((void *) target == NULL)
{
- gomp_mutex_unlock (>lock);
- gomp_fatal ("attempt to attach null pointer");
+ n->aux->attach_count[idx] = 0;  // proposed change 
attached in patch
+ return;
...
Some quick testing shows that clang/LLVM behaves mostly the same as GCC.

OTOH, nVidia HPC SDK (PGI) does appear to silently go on without bailing out.
(I have not verified if 534.hpgmgfv_t fully works with PGI, just observed how 
their
runtime handles NULL base-pointers)

I don't see any explicit description of this case in the OpenMP specifications, 
just simply
"The corresponding pointer variable becomes an attached pointer", lack of 
description on how
this is to be handled.

So WDYGT? Should libgomp behavior be adjusted here, or should SPEC benchmark 
source be adjusted?
(The attached patch to adjust libgomp attach behavior has been regtested 
without regressions, FWIW)

Thanks,
Chung-Lin

2022-03-09  Chung-Lin Tang  

libgomp/ChangeLog:

* target.c (gomp_attach_pointer): When pointer is NULL,
return instead of calling gomp_fatal.
diff --git a/libgomp/target.c b/libgomp/target.c
index 9017458885e..0e8bbd83c20 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -796,8 +796,8 @@ gomp_attach_pointer (struct gomp_device_descr *devicep,
 
   if ((void *) target == NULL)
{
- gomp_mutex_unlock (>lock);
- gomp_fatal ("attempt to attach null pointer");
+ n->aux->attach_count[idx] = 0;
+ return;
}
 
   s.host_start = target + bias;


[PATCH, OpenMP, C++] Allow classes with static members to be mappable

2022-03-09 Thread Chung-Lin Tang

Hi Jakub,
Now in OpenMP 5.x, static members are supposed to be not a barrier for a class
to be target-mapped.

There is the related issue of actually providing access to static 
const/constexpr
members on the GPU (probably a case of 
https://github.com/OpenMP/spec/issues/2158)
but that is for later.

This patch basically just removes the check for static members inside
cp_omp_mappable_type_1, and adjusts a testcase. Not sure if more tests are 
needed.
Tested on trunk without regressions, okay when stage1 reopens?

Thanks,
Chung-Lin

2022-03-09  Chung-Lin Tang  

gcc/cp/ChangeLog:

* decl2.cc (cp_omp_mappable_type_1): Remove requirement that all
members must be non-static; remove check for static fields.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/unmappable-1.C: Adjust testcase.diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index c53acf4546d..ace7783d9bd 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -1544,21 +1544,14 @@ cp_omp_mappable_type_1 (tree type, bool notes)
   /* Arrays have mappable type if the elements have mappable type.  */
   while (TREE_CODE (type) == ARRAY_TYPE)
 type = TREE_TYPE (type);
-  /* All data members must be non-static.  */
+
   if (CLASS_TYPE_P (type))
 {
   tree field;
   for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
-   if (VAR_P (field))
- {
-   if (notes)
- inform (DECL_SOURCE_LOCATION (field),
- "static field %qD is not mappable", field);
-   result = false;
- }
/* All fields must have mappable types.  */
-   else if (TREE_CODE (field) == FIELD_DECL
-&& !cp_omp_mappable_type_1 (TREE_TYPE (field), notes))
+   if (TREE_CODE (field) == FIELD_DECL
+   && !cp_omp_mappable_type_1 (TREE_TYPE (field), notes))
  result = false;
 }
   return result;
diff --git a/gcc/testsuite/g++.dg/gomp/unmappable-1.C 
b/gcc/testsuite/g++.dg/gomp/unmappable-1.C
index 364f884500c..1532b9c73f1 100644
--- a/gcc/testsuite/g++.dg/gomp/unmappable-1.C
+++ b/gcc/testsuite/g++.dg/gomp/unmappable-1.C
@@ -4,7 +4,7 @@
 class C
 {
 public:
-  static int static_member; /* { dg-message "static field .C::static_member. 
is not mappable" } */
+  static int static_member;
   virtual void f() {}
 };
 


[PATCH, OpenMP, C/C++] Handle array reference base-pointers in array sections

2022-02-21 Thread Chung-Lin Tang

Hi Jakub,
as encountered in cases where a program constructs its own deep-copying
for arrays-of-pointers, e.g:

   #pragma omp target enter data map(to:level->vectors[:N])
   for (i = 0; i < N; i++)
 #pragma omp target enter data map(to:level->vectors[i][:N])

We need to treat the part of the array reference before the array section
as a base-pointer (here 'level->vectors[i]'), providing pointer-attachment 
behavior.

This patch adds this inside handle_omp_array_sections(), tracing the whole
sequence of array dimensions, creating a whole base-pointer reference
iteratively using build_array_ref(). The conditions are that each of the
"absorbed" dimensions must be length==1, and the final reference must be
of pointer-type (so that pointer attachment makes sense).

There's also a little patch in gimplify_scan_omp_clauses(), to make sure
the array-ref base-pointer goes down the right path.

This case was encountered when working to make 534.hpgmgfv_t from
SPEChpc 2021 properly compile. Tested without regressions on trunk.
Okay to go in once stage1 opens?

Thanks,
Chung-Lin

2022-02-21  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-typeck.cc (handle_omp_array_sections): Add handling for
creating array-reference base-pointer attachment clause.

gcc/cp/ChangeLog:

* semantics.cc (handle_omp_array_sections): Add handling for
creating array-reference base-pointer attachment clause.

gcc/ChangeLog:

* gimplify.cc (gimplify_scan_omp_clauses): Add case for
attach/detach map kind for ARRAY_REF of POINTER_TYPE.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-enter-data-1.c: Adjust testcase.

libgomp/testsuite/ChangeLog:

* libgomp.c-c++-common/ptr-attach-2.c: New test.diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 3075c883548..4257e373557 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -13649,6 +13649,10 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
maybe_zero_len = true;
 
+  struct dim { tree low_bound, length; };
+  auto_vec dims (num);
+  dims.safe_grow (num);
+
   for (i = num, t = OMP_CLAUSE_DECL (c); i > 0;
   t = TREE_CHAIN (t))
{
@@ -13763,6 +13767,9 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  else
size = size_binop (MULT_EXPR, size, l);
}
+
+ dim d = { low_bound, length };
+ dims[i] = d;
}
   if (side_effects)
size = build2 (COMPOUND_EXPR, sizetype, side_effects, size);
@@ -13802,6 +13809,23 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  OMP_CLAUSE_DECL (c) = t;
  return false;
}
+
+  tree aref = t;
+  for (i = 0; i < dims.length (); i++)
+   {
+ if (dims[i].length && integer_onep (dims[i].length))
+   {
+ tree lb = dims[i].low_bound;
+ aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb);
+   }
+ else
+   {
+ if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE)
+   t = aref;
+ break;
+   }
+   }
+
   first = c_fully_fold (first, false, NULL);
   OMP_CLAUSE_DECL (c) = first;
   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
@@ -13836,7 +13860,8 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  break;
}
   tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  if (TREE_CODE (t) == COMPONENT_REF)
+  if (TREE_CODE (t) == COMPONENT_REF || TREE_CODE (t) == ARRAY_REF
+ || TREE_CODE (t) == INDIRECT_REF)
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
   else
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0cb17a6a8ab..646f4883d66 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5497,6 +5497,10 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (processing_template_decl && maybe_zero_len)
return false;
 
+  struct dim { tree low_bound, length; };
+  auto_vec dims (num);
+  dims.safe_grow (num);
+
   for (i = num, t = OMP_CLAUSE_DECL (c); i > 0;
   t = TREE_CHAIN (t))
{
@@ -5604,6 +5608,9 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
  else
size = size_binop (MULT_EXPR, size, l);
}
+
+ dim d = { low_bound, length };
+ dims[i] = d;
}
   if (!processing_template_decl)
{
@@ -5647,6 +5654,24 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  OMP_CLAUSE_DECL (c) = t;
  return false;
}
+
+ tree aref = t;
+ for (i = 0; i < dims.length (); i++)
+   {
+  

Re: [PATCH, OpenMP] PR103642 - Fix omp-low ICE for indirect references based off component access

2022-01-17 Thread Chung-Lin Tang

Ping.

On 2022/1/3 10:15 PM, Chung-Lin Tang wrote:

This issue was triggered after the patch extending syntax for component access
in map clauses
(https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0ab29cf0bb68960c)

In gimplify_scan_omp_clauses, the case for handling indirect accesses (which 
creates
firstprivate ptr and zero-length array section map for such decls) was 
erroneously
went into for non-pointer cases (here being the base struct decl), so added the
appropriate checks there.

Added new testcase is a compile only test for the ICE. The original omptests 
t-partial-struct
test actually should not execute correctly, because for map(t.s->a[:N]), 
map(t.s[:1])
is not implicitly mapped, thus the entire offloaded access does not work as is.
(fixing that omptests test is out of scope here)

Tested without regressions, okay for trunk?

Thanks,
Chung-Lin

2022-01-03  Chung-Lin Tang  

gcc/ChangeLog:

 PR middle-end/103642
 * gimplify.c (gimplify_scan_omp_clauses): Do not do indir_p handling
 for non-pointer or non-reference-to-pointer cases.

gcc/testsuite/ChangeLog:

 * c-c++-common/gomp/pr103642.c: New test.







Re: [PATCH, OpenMP, C/C++] Fix PR103705

2022-01-10 Thread Chung-Lin Tang

Forgot to attach the patch, here it is :P

On 2022/1/10 10:59 PM, Chung-Lin Tang wrote:

For cases like:
   #pragma omp target update from(s[0].a[0:1])

The handling in [c_]finish_omp_clauses was only peeling off ARRAY_REF once
before the loop handling COMPONENT_REF, and snagged when the base of the
component_ref is an array access. This adds the handling there for both C
and C++ front-ends.

(ICE started to happen after 
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6c0399378e77d029
where map/from/to clause syntax was relaxed to allow more stuff)

Tested without regressions, okay to commit?

Thanks,
Chung-Lin

     PR c++/103705

gcc/c/ChangeLog:

     * c-typeck.c (c_finish_omp_clauses): Also continue peeling off of
     outer node for ARRAY_REFs.

gcc/cp/ChangeLog:

     * semantics.c (finish_omp_clauses): Also continue peeling off of
     outer node for ARRAY_REFs.

gcc/testsuite/ChangeLog:

 * c-c++-common/gomp/pr103705.c: New test.diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 8b492cf5bed..ac6618eca5c 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -14929,7 +14929,8 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
t = TREE_OPERAND (t, 0);
}
}
- while (TREE_CODE (t) == COMPONENT_REF);
+ while (TREE_CODE (t) == COMPONENT_REF
+|| TREE_CODE (t) == ARRAY_REF);
 
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
  && OMP_CLAUSE_MAP_IMPLICIT (c)
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 645654768e3..a7435ed1266 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7931,7 +7931,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
t = TREE_OPERAND (t, 0);
}
}
- while (TREE_CODE (t) == COMPONENT_REF);
+ while (TREE_CODE (t) == COMPONENT_REF
+|| TREE_CODE (t) == ARRAY_REF);
 
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
  && OMP_CLAUSE_MAP_IMPLICIT (c)
diff --git a/gcc/testsuite/c-c++-common/gomp/pr103705.c 
b/gcc/testsuite/c-c++-common/gomp/pr103705.c
new file mode 100644
index 000..bf4c7066d28
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pr103705.c
@@ -0,0 +1,14 @@
+/* PR c++/103705 */
+/* { dg-do compile } */
+
+struct S
+{
+  int a[2];
+};
+
+int main (void)
+{
+  struct S s[1];
+  #pragma omp target update from(s[0].a[0:1])
+  return 0;
+}


[PATCH, OpenMP, C/C++] Fix PR103705

2022-01-10 Thread Chung-Lin Tang

For cases like:
  #pragma omp target update from(s[0].a[0:1])

The handling in [c_]finish_omp_clauses was only peeling off ARRAY_REF once
before the loop handling COMPONENT_REF, and snagged when the base of the
component_ref is an array access. This adds the handling there for both C
and C++ front-ends.

(ICE started to happen after 
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6c0399378e77d029
where map/from/to clause syntax was relaxed to allow more stuff)

Tested without regressions, okay to commit?

Thanks,
Chung-Lin

PR c++/103705

gcc/c/ChangeLog:

* c-typeck.c (c_finish_omp_clauses): Also continue peeling off of
outer node for ARRAY_REFs.

gcc/cp/ChangeLog:

* semantics.c (finish_omp_clauses): Also continue peeling off of
outer node for ARRAY_REFs.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/pr103705.c: New test.


[PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error

2022-01-04 Thread Chung-Lin Tang

In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.

This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.

This change was fairly obvious, so committed directly.

Thanks,
Chung-Lin

libgomp/ChangeLog:

* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
From fbb592407c9dd244b4cea086cbb90d7bd0bf60bb Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 4 Jan 2022 17:26:23 +0800
Subject: [PATCH] libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during
 offload image load

In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.

This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
---
 libgomp/plugin/plugin-gcn.c   | 4 ++--
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 8ffd3d1a2cf..d0f05b28bf3 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3401,12 +3401,12 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, 
const void *target_data,
}
 }
 
-  GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
+  GCN_DEBUG ("Looking for variable %s\n", XSTRING (GOMP_DEVICE_NUM_VAR));
 
   hsa_status_t status;
   hsa_executable_symbol_t var_symbol;
   status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
-STRINGX (GOMP_DEVICE_NUM_VAR),
+XSTRING (GOMP_DEVICE_NUM_VAR),
 agent->id, 0, _symbol);
   if (status == HSA_STATUS_SUCCESS)
 {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index f32276b0a18..b4f0a84d77a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1353,7 +1353,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const 
void *target_data,
   size_t device_num_varsize;
   CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, _num_varptr,
  _num_varsize, module,
- STRINGX (GOMP_DEVICE_NUM_VAR));
+ XSTRING (GOMP_DEVICE_NUM_VAR));
   if (r == CUDA_SUCCESS)
 {
   targ_tbl->start = (uintptr_t) device_num_varptr;
-- 
2.17.1



[PATCH, OpenMP, Fortran] PR103643: ICE in gimplify_omp_affinity

2022-01-03 Thread Chung-Lin Tang

After the PR90030 patch, which removes the universal casting of all Fortran 
array pointers to 'c_char*',
a Fortran descriptor based array passed into an affinity() clause now looks 
like:

- #pragma omp task private(i) shared(b) affinity(*(c_char *) a.data)
+ #pragma omp task private(i) shared(b) affinity(*(integer(kind=4)[0:] * 
restrict) a.data)

The 'integer(kind=4)[0:]' incomplete type appears to be causing ICE during 
gimplify_expr() due to
is_gimple_val, fb_rvalue. The ICE appears to be fixed just by adjusting to 
'is_gimple_lvalue, fb_lvalue'.
Considering the use of the affinity() clause, which should be specifying the 
location of a particular
object in memory, this probably makes sense.

Tested without regressions, seeking approval for trunk.

Thanks,
Chung-Lin

2022-01-03  Chung-Lin Tang  

gcc/ChangeLog:

PR middle-end/103643
* gimplify.c (gimplify_omp_affinity): Adjust gimplify_expr of entire
OMP_CLAUSE_DECL to use 'is_gimple_lvalue, fb_lvalue'

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/pr103643.f90: New test.diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index b118c72f62c..87cc01483dd 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8123,7 +8123,7 @@ gimplify_omp_affinity (tree *list_p, gimple_seq *pre_p)
if (error_operand_p (OMP_CLAUSE_DECL (c)))
  return;
if (gimplify_expr (_CLAUSE_DECL (c), pre_p, NULL,
-  is_gimple_val, fb_rvalue) == GS_ERROR)
+  is_gimple_lvalue, fb_lvalue) == GS_ERROR)
  return;
gimplify_and_add (OMP_CLAUSE_DECL (c), pre_p);
  }
diff --git a/gcc/testsuite/gfortran.dg/gomp/pr103643.f90 
b/gcc/testsuite/gfortran.dg/gomp/pr103643.f90
new file mode 100644
index 000..3b409f5f858
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/pr103643.f90
@@ -0,0 +1,19 @@
+! PR middle-end/103643
+! { dg-do compile }
+
+program test_task_affinity
+  implicit none
+  integer i
+  integer, allocatable :: A(:)
+
+  allocate (A(10))
+
+  !$omp target
+  !$omp task affinity(A)
+  do i = 1, 10
+ A(i) = 0
+  end do
+  !$omp end task
+  !$omp end target
+
+end program test_task_affinity


[PATCH, OpenMP] PR103642 - Fix omp-low ICE for indirect references based off component access

2022-01-03 Thread Chung-Lin Tang

This issue was triggered after the patch extending syntax for component access
in map clauses
(https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0ab29cf0bb68960c)

In gimplify_scan_omp_clauses, the case for handling indirect accesses (which 
creates
firstprivate ptr and zero-length array section map for such decls) was 
erroneously
went into for non-pointer cases (here being the base struct decl), so added the
appropriate checks there.

Added new testcase is a compile only test for the ICE. The original omptests 
t-partial-struct
test actually should not execute correctly, because for map(t.s->a[:N]), 
map(t.s[:1])
is not implicitly mapped, thus the entire offloaded access does not work as is.
(fixing that omptests test is out of scope here)

Tested without regressions, okay for trunk?

Thanks,
Chung-Lin

2022-01-03  Chung-Lin Tang  

gcc/ChangeLog:

PR middle-end/103642
* gimplify.c (gimplify_scan_omp_clauses): Do not do indir_p handling
for non-pointer or non-reference-to-pointer cases.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/pr103642.c: New test.




diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index b118c72f62c..bdc8189c2a7 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9543,7 +9543,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  == REFERENCE_TYPE))
decl = TREE_OPERAND (decl, 0);
}
- if (decl != orig_decl && DECL_P (decl) && indir_p)
+ if (decl != orig_decl && DECL_P (decl) && indir_p
+ && (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+ || (decl_ref
+ && TREE_CODE (TREE_TYPE (decl_ref)) == POINTER_TYPE)))
{
  gomp_map_kind k
= ((code == OACC_EXIT_DATA || code == OMP_TARGET_EXIT_DATA)
diff --git a/gcc/testsuite/c-c++-common/gomp/pr103642.c 
b/gcc/testsuite/c-c++-common/gomp/pr103642.c
new file mode 100644
index 000..c5451596b69
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pr103642.c
@@ -0,0 +1,31 @@
+/* PR middle-end/103642 */
+/* { dg-do compile } */
+
+#include 
+
+typedef struct
+{
+  int *a;
+} S;
+
+typedef struct
+{
+  S *s;
+  int *ptr;
+} T;
+
+#define N 10
+
+int main (void)
+{
+  T t;
+  t.s = (S *) malloc (sizeof (S));
+  t.s->a = (int *) malloc (sizeof(int) * N);
+
+  #pragma omp target map(from: t.s->a[:N])
+  {
+t.s->a[0] = 1;
+  }
+
+  return 0;
+}


Re: [PATCH, v5, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v5]

2021-12-09 Thread Chung-Lin Tang




On 2021/12/4 12:47 AM, Jakub Jelinek wrote:

On Tue, Nov 16, 2021 at 08:43:27PM +0800, Chung-Lin Tang wrote:

2021-11-16  Chung-Lin Tang  

PR middle-end/92120

gcc/cp/ChangeLog:


...

+ if (allow_zero_length_array_sections)
+   {
+ /* When allowing attachment to zero-length array sections, we
+allow attaching to NULL pointers when the target region is not
+mapped.  */
+ data = 0;
+   }


No {}s around single statement if body.

Otherwise LGTM.

Jakub



Thanks for the review and approval, Jakub.

Thomas, I pushed another 2766448c5cc3efc4 commit to fix the non-offload config 
FAILs, just FYI.

Chung-Lin






[PATCH, Fortran] Fix setting of array lower bound for named arrays

2021-11-29 Thread Chung-Lin Tang

This patch by Tobias, fixes a case of setting array low-bounds, found
for particular uses of SOURCE=/MOLD=.

For example:
program A_M
  implicit none
  real, dimension (:), allocatable :: A, B
  allocate (A(0:5))
  call Init (A)
contains
  subroutine Init ( A )
real, dimension ( 0 : ), intent ( in ) :: A
integer, dimension ( 1 ) :: lb_B

allocate (B, mold = A)
...
lb_B = lbound (B, dim=1)   ! Error: lb_B assigned 1, instead of 0 like 
lower-bound of A.

Referencing the Fortran standard:

"16.9.109 LBOUND (ARRAY [, DIM, KIND])"
states:
"If DIM is present, ARRAY is a whole array, and either ARRAY is
 an assumed-size array of rank DIM or dimension DIM of ARRAY has
 nonzero extent, the result has a value equal to the lower bound
 for subscript DIM of ARRAY. Otherwise, if DIM is present, the
 result value is 1."

And on what is a "whole array":

"9.5.2 Whole arrays"
"A whole array is a named array or a structure component ..."

The attached patch adjusts the relevant part in gfc_trans_allocate() to only set
e3_has_nodescriptor only for non-named arrays.

Tobias has tested this once, and I've tested this patch as well on our complete 
set of
testsuites (which usually serves for OpenMP related stuff). Everything appears 
well with no regressions.

Is this okay for trunk?

Thanks,
Chung-Lin

2021-11-29  Tobias Burnus  

gcc/fortran/ChangeLog:

* trans-stmt.c (gfc_trans_allocate): Set e3_has_nodescriptor to true
only for non-named arrays.

gcc/testsuite/ChangeLog:

* gfortran.dg/allocate_with_source_26.f90: Adjust testcase.
* gfortran.dg/allocate_with_mold_4.f90: New testcase.diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index bdf7957..982e1e0 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -6660,16 +6660,13 @@ gfc_trans_allocate (gfc_code * code)
   else
e3rhs = gfc_copy_expr (code->expr3);
 
-  // We need to propagate the bounds of the expr3 for source=/mold=;
-  // however, for nondescriptor arrays, we use internally a lower bound
-  // of zero instead of one, which needs to be corrected for the allocate 
obj
-  if (e3_is == E3_DESC)
-   {
- symbol_attribute attr = gfc_expr_attr (code->expr3);
- if (code->expr3->expr_type == EXPR_ARRAY ||
- (!attr.allocatable && !attr.pointer))
-   e3_has_nodescriptor = true;
-   }
+  // We need to propagate the bounds of the expr3 for source=/mold=.
+  // However, for non-named arrays, the lbound has to be 1 and neither the
+  // bound used inside the called function even when returning an
+  // allocatable/pointer nor the zero used internally.
+  if (e3_is == E3_DESC
+ && code->expr3->expr_type != EXPR_VARIABLE)
+   e3_has_nodescriptor = true;
 }
 
   /* Loop over all objects to allocate.  */
diff --git a/gcc/testsuite/gfortran.dg/allocate_with_mold_4.f90 
b/gcc/testsuite/gfortran.dg/allocate_with_mold_4.f90
new file mode 100644
index 000..d545fe1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/allocate_with_mold_4.f90
@@ -0,0 +1,24 @@
+program A_M
+  implicit none
+  real, parameter :: C(5:10) = 5.0
+  real, dimension (:), allocatable :: A, B
+  allocate (A(6))
+  call Init (A)
+contains
+  subroutine Init ( A )
+real, dimension ( -1 : ), intent ( in ) :: A
+integer, dimension ( 1 ) :: lb_B
+
+allocate (B, mold = A)
+if (any (lbound (B) /= lbound (A))) stop 1
+if (any (ubound (B) /= ubound (A))) stop 2
+if (any (shape (B) /= shape (A))) stop 3
+if (size (B) /= size (A)) stop 4
+deallocate (B)
+allocate (B, mold = C)
+if (any (lbound (B) /= lbound (C))) stop 5
+if (any (ubound (B) /= ubound (C))) stop 6
+if (any (shape (B) /= shape (C))) stop 7
+if (size (B) /= size (C)) stop 8
+end
+end 
diff --git a/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90 
b/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90
index 28f24fc..323c8a3 100644
--- a/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90
+++ b/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90
@@ -34,23 +34,23 @@ program p
  if (lbound(p1, 1) /= 3 .or. ubound(p1, 1) /= 4 &
  .or. lbound(p2, 1) /= 3 .or. ubound(p2, 1) /= 4 &
  .or. lbound(p3, 1) /= 1 .or. ubound(p3, 1) /= 2 &
- .or. lbound(p4, 1) /= 7 .or. ubound(p4, 1) /= 8 &
+ .or. lbound(p4, 1) /= 1 .or. ubound(p4, 1) /= 2 &
  .or. p1(3)%i /= 43 .or. p1(4)%i /= 56 &
  .or. p2(3)%i /= 43 .or. p2(4)%i /= 56 &
  .or. p3(1)%i /= 43 .or. p3(2)%i /= 56 &
- .or. p4(7)%i /= 11 .or. p4(8)%i /= 12) then
+ .or. p4(1)%i /= 11 .or. p4(2)%i /= 12) then
call abort()
  endif
 
  !write(*,*) lbound(a,1), ubound(a,1) ! prints 1 3
  !write(*,*) lbound(b,1), ubound(b,1) ! prints 1 3
- !write(*,*) lbound(c,1), ubound(c,1) ! prints 3 5
+ !write(*,*) lbound(c,1), ubound(c,1) ! prints 1 3
  !write(*,*) lbound(d,1), ubound(d,1) ! prints 1 5
  !write(*,*) lbound(e,1), ubound(e,1) ! prints 1 6
 

Re: [PATCH, PR90030] Fortran OpenMP/OpenACC array mapping alignment fix

2021-11-19 Thread Chung-Lin Tang

Ping.

On 2021/11/4 4:23 PM, Chung-Lin Tang wrote:

Hi Jakub,
As Thomas reported and submitted a patch a while ago:
https://gcc.gnu.org/pipermail/gcc-patches/2019-April/519932.html
https://gcc.gnu.org/pipermail/gcc-patches/2019-May/522738.html

There's an issue with the Fortran front-end when mapping arrays: when
creating the data MEM_REF for the map clause, there's a convention of
casting the referencing pointer to 'c_char *' by
fold_convert (build_pointer_type (char_type_node), ptr).

This causes the alignment passed to the libgomp runtime for array data
hardwared to '1', and causes alignment errors on the offload target
(not always showing up, but can trigger due to slight change of clause
ordering)

This patch is not exactly Thomas' patch from 2019, but does the same
thing. The new libgomp tests are directly reused though. A lot of
scan test adjustment is also included in this patch.

Patch has been tested for no regressions for gfortran and libgomp, is
this okay for trunk?

Thanks,
Chung-Lin

Fortran: fix array alignment for OpenMP/OpenACC target mapping clauses [PR90030]

The Fortran front-end is creating maps of array data with a type of pointer to
char_type_node, which when eventually passed to libgomp during runtime, marks
the passed array with an alignment of 1, which can cause mapping alignment
errors on the offload target.

This patch removes the related fold_convert(build_pointer_type (char_type_node))
calls in fortran/trans-openmp.c, and adds gcc_asserts to ensure pointer type.

2021-11-04  Chung-Lin Tang  
     Thomas Schwinge 

 PR fortran/90030

gcc/fortran/ChangeLog:

 * trans-openmp.c (gfc_omp_finish_clause): Remove fold_convert to pointer
 to char_type_node, add gcc_assert of POINTER_TYPE_P.
 (gfc_trans_omp_array_section): Likewise.
 (gfc_trans_omp_clauses): Likewise.

gcc/testsuite/ChangeLog:

 * gfortran.dg/goacc/finalize-1.f: Adjust scan test.
 * gfortran.dg/gomp/affinity-clause-1.f90: Likewise.
 * gfortran.dg/gomp/affinity-clause-5.f90: Likewise.
 * gfortran.dg/gomp/defaultmap-4.f90: Likewise.
 * gfortran.dg/gomp/defaultmap-5.f90: Likewise.
 * gfortran.dg/gomp/defaultmap-6.f90: Likewise.
 * gfortran.dg/gomp/map-3.f90: Likewise.
 * gfortran.dg/gomp/pr78260-2.f90: Likewise.
 * gfortran.dg/gomp/pr78260-3.f90: Likewise.

libgomp/ChangeLog:

 * testsuite/libgomp.oacc-fortran/pr90030.f90: New test.
 * testsuite/libgomp.fortran/pr90030.f90: New test.


[PATCH, v2, OpenMP 5.0] Remove array section base-pointer mapping semantics, and other front-end adjustments (mainline trunk)

2021-11-19 Thread Chung-Lin Tang

Hi Jakub,
attached is a rebased version of this "OpenMP fixes/adjustments" patch.

This version removes some of the (ort == C_ORT_OMP || ort == C_ORT_ACC) stuff 
that's not needed
in handle_omp_array_sections_1 and [c_]finish_omp_clauses.

Note that this is meant to be patched atop of the recent also posted C++ 
PR92120 v5 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584602.html

Again, tested without regressions (together with the PR92120 patch), awaiting 
review.

Thanks,
Chung-Lin

(ChangeLog updated below)

On 2021/5/25 9:36 PM, Chung-Lin Tang wrote:


This patch largely implements three pieces of functionality:

(1) Per discussion and clarification on the omp-lang mailing list,
standards conforming behavior for mapping array sections should *NOT* also map 
the base-pointer,
i.e for this code:

 struct S { int *ptr; ... };
 struct S s;
 #pragma omp target enter data map(to: s.ptr[:100])

Currently we generate after gimplify:
#pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \
    map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 
0])

which is deemed incorrect. After this patch, the gimplify results are now 
adjusted to:
#pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])
(the attach operation is still generated, and if s.ptr is already mapped prior, 
attachment will happen)

The correct way of achieving the base-pointer-also-mapped behavior would be to 
use:
#pragma omp target enter data map(to: s.ptr, s.ptr[:100])

This adjustment in behavior required a number of small adjustments here and 
there in gimplify, including
to accomodate map sequences for C++ references.

There is also a small Fortran front-end patch involved (hence CCing Tobias and 
fortran@).
The new gimplify processing changed behavior in handling 
GOMP_MAP_ALWAYS_POINTER maps such that
the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the 
Fortran FE was generating
a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and 
the pre-patch behavior
was removing this map anyways. I have a small change in 
trans-openmp.c:gfc_trans_omp_array_section
to not generate the map in this case, and so far no bad test results.

(2) The second part (though kind of related to the first above) are fixes in 
libgomp/target.c
to not overwrite attached pointers when handling device<->host copies, mainly for the 
"always" case.
This behavior is also noted in the 5.0 spec, but not yet properly coded before.

(3) The third is a set of changes to the C/C++ front-ends to extend the allowed 
component access syntax
in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, 
so despite in the long
term the entire map clause syntax parsing is probably going to be revamped, 
we're still adding this in
for now. These changes are enabled for both OpenACC and OpenMP.



2021-11-19  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-parser.c (struct omp_dim): New struct type for use inside
c_parser_omp_variable_list.
(c_parser_omp_variable_list): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(c_parser_omp_clause_to): Set 'allow_deref' to true in call to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_from): Likewise.
* c-typeck.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(c_finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/cp/ChangeLog:

* parser.c (struct omp_dim): New struct type for use inside
cp_parser_omp_var_list_no_open.
(cp_parser_omp_var_list_no_open): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
cp_parser_omp_var_list for to/from clauses.
* semantics.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(handle_omp_array_sections): Adjust pointer map generation of
references.
(finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/fortran/ChangeLog:

* trans-openmp.c (gfc_trans_omp_array_section): Do not generate
GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.

gcc/ChangeLog:

* gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
accomodate case where 'offset' return of get_inner_reference is
non-NULL.
(is_or_contains_p): Further robustify conditions.
(omp_target_reorder_clauses): In alloc/to/from sortin

[PATCH, v5, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v5]

2021-11-16 Thread Chung-Lin Tang

Hi Jakub,

On 2021/6/24 9:15 PM, Jakub Jelinek wrote:

On Fri, Jun 18, 2021 at 10:25:16PM +0800, Chung-Lin Tang wrote:

Note, you'll need to rebase your patch, it clashes with
r12-1768-g7619d33471c10fe3d149dcbb701d99ed3dd23528.
Sorry for that.  And sorry for patch review delay.


--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13104,6 +13104,12 @@ handle_omp_array_sections_1 (tree c, tree t, vec 
,
  return error_mark_node;
}
  t = TREE_OPERAND (t, 0);
+ if ((ort == C_ORT_ACC || ort == C_ORT_OMP)


Map clauses never appear on declare simd, so
(ort == C_ORT_ACC || ort == C_ORT_OMP)
previously meant always and since the in_reduction change is incorrect
(as C_ORT_OMP_TARGET is used for target construct but not for
e.g. target data* or target update).


+ && TREE_CODE (t) == MEM_REF)


Upon reviewing, it appears that most of these C_ORT_* tests are no longer 
needed, removed in new patch.


So please just use if (TREE_CODE (t) == MEM_REF)
or explain when it shouldn't trigger.


@@ -14736,6 +14743,11 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
{
  while (TREE_CODE (t) == COMPONENT_REF)
t = TREE_OPERAND (t, 0);
+ if (TREE_CODE (t) == MEM_REF)
+   {
+ t = TREE_OPERAND (t, 0);
+ STRIP_NOPS (t);
+   }


This doesn't look correct.  At least the parsing (and the spec AFAIK)
doesn't ensure that if there is ->, it must come before all the dots.
So, if one uses map (s->x.y) the above would work, but if map (s->x.y->z) or
map (s.a->b->c->d->e) is used, it wouldn't.  I'd expect a single
while loop that looks through COMPONENT_REFs and MEM_REFs as they appear.
Maybe the handle_omp_array_sections_1 MEM_REF case too?

Or do you want to have it done incrementally, start with supporting only
a single -> first before all the dots and later on add support for the rest?

I think the 5.0 and especially 5.1 wording basically says that map clause
operand is arbitrary lvalue expression that includes array section support
too, so eventually we should just have somewhere in parsing scope a bool
whether OpenMP array sections are allowed or not, add OMP_ARRAY_REF or
similar tree code for those and after parsing the expression, ensure
array sections appear only where they can appear and for a subset of the
lvalue expressions where we have decl plus series of -> field or . field
or [ index ] or [ array section stuff ] handle those specially.
That arbitrary lvalue can certainly be done incrementally.
map (foo(123)->a.b[3]->c.d[:7]) and the like.


Indeed this kind of modification is sort of "as encountered", so there are
probably many cases that are not completely handled yet; it's not just
the front-end, but also changes in gimplify_scan_omp_clauses().

However, I had another patch that should've plowed a bit further on this:
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html
as well as those patch sets that Julian is working on.
(our current plan is to have my sets go in first, and Julian's on top,
to minimize clashing)


  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
  && OMP_CLAUSE_MAP_IMPLICIT (c)
  && (bitmap_bit_p (_head, DECL_UID (t))
@@ -14802,6 +14814,15 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
   bias) to zero here, so it is not set erroneously to the pointer
   size later on in gimplify.c.  */
OMP_CLAUSE_SIZE (c) = size_zero_node;
+ indir_component_ref_p = false;
+ if ((ort == C_ORT_ACC || ort == C_ORT_OMP)


Same comment about ort tests.


+ && TREE_CODE (t) == COMPONENT_REF
+ && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF)
+   {
+ t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
+ indir_component_ref_p = true;
+ STRIP_NOPS (t);
+   }


Again, this can handle only a single ->


@@ -42330,16 +42328,10 @@ cp_parser_omp_target (cp_parser *parser, cp_token 
*pragma_tok,
cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = tc;
  }
}
- tree stmt = make_node (OMP_TARGET);
- TREE_TYPE (stmt) = void_type_node;
- OMP_TARGET_CLAUSES (stmt) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET];
- c_omp_adjust_map_clauses (OMP_TARGET_CLAUSES (stmt), true);
- OMP_TARGET_BODY (stmt) = body;
- OMP_TARGET_COMBINED (stmt) = 1;
- SET_EXPR_LOCATION (stmt, pragma_tok->location);
- add_stmt (stmt);
- pc = _TARGET_CLAUSES (stmt);
- goto check_clauses;
+ c_omp_adjust_map_clauses (cclauses[C_OMP_CLAUSE_SPLIT_TARGET], true);
+ finish_omp_target (pragma_tok->

[PATCH, v2, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)

2021-11-05 Thread Chung-Lin Tang

Hi Jakub,

On 2021/6/24 11:55 PM, Jakub Jelinek wrote:

On Fri, May 14, 2021 at 09:20:25PM +0800, Chung-Lin Tang wrote:

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e790f08b23f..69c4a8e0a0a 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -10374,6 +10374,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
*data)
  gcc_unreachable ();
}
OMP_CLAUSE_SET_MAP_KIND (clause, kind);
+  OMP_CLAUSE_MAP_IMPLICIT_P (clause) = 1;
if (DECL_SIZE (decl)
  && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
{


As Thomas mentioned, there is now also OMP_CLAUSE_MAP_IMPLICIT that means
something different:
/* Nonzero on map clauses added implicitly for reduction clauses on combined
or composite constructs.  They shall be removed if there is an explicit
map clause.  */
Having OMP_CLAUSE_MAP_IMPLICIT and OMP_CLAUSE_MAP_IMPLICIT_P would be too
confusing.  So either we need to use just one flag for both purposes or
have two different flags and find a better name for one of them.
The former would be possible if no OMP_CLAUSE_MAP clauses added by the FEs
are implicit - then you could clear OMP_CLAUSE_MAP_IMPLICIT in
gimplify_scan_omp_clauses.  I wonder if it is the case though, e.g. doesn't
your "Improve OpenMP target support for C++ [PR92120 v4]" patch add a lot of
such implicit map clauses (e.g. the this[:1] and various others)?


I have changed the name to OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P, to signal that
this bit is to be passed to the runtime. Right now its intended to be used by
clauses created by the middle-end, but front-end uses like that for C++ could
be clarified later.


Also, gimplify_adjust_omp_clauses_1 sometimes doesn't add just one map
clause, but several, shouldn't those be marked implicit too?  And similarly
it calls lang_hooks.decls.omp_finish_clause which can add even further map
clauses implicitly, shouldn't those be implicit too (in that case copy
the flag from the clause it is called on to the extra clauses it adds)?

Also as Thomas mentioned, it should be restricted to non-OpenACC,
it can check gimplify_omp_ctxp->region_type if it is OpenMP or OpenACC.


Agreed, I've adjusted the patch to only to this implicit setting for OpenMP.
This reduces a lot of the originally needed scan test adjustment for existing 
OpenACC testcases.


@@ -10971,9 +10972,15 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, 
gimple_seq body, tree *list_p,
list_p = _CLAUSE_CHAIN (c);
  }
  
-  /* Add in any implicit data sharing.  */

+  /* Add in any implicit data sharing. Implicit clauses are added at the start


Two spaces after dot in comments.


Done.


+ of the clause list, but after any non-map clauses.  */
struct gimplify_adjust_omp_clauses_data data;
-  data.list_p = list_p;
+  tree *implicit_add_list_p = orig_list_p;
+  while (*implicit_add_list_p
+&& OMP_CLAUSE_CODE (*implicit_add_list_p) != OMP_CLAUSE_MAP)
+implicit_add_list_p = _CLAUSE_CHAIN (*implicit_add_list_p);


Why are the implicit map clauses added first and not last?


As I also explained in the first submission email, due to the processing order,
if implicit classes are added last (and processed last), for example:

  #pragma omp target map(tofrom: var.ptr[:N]) map(tofrom: var[implicit])
  {
 // access of var.ptr[]
  }

The explicit var.ptr[:N] will not find anything to map, because the (implicit) 
map(var) has not been seen yet,
and the assumed array section attachment behavior will fail.

Only an order like: map(tofrom: var[implicit]) map(tofrom: var.ptr[:N]) will 
the usual assumed behavior show.

And yes, this depends on the new behavior implemented by patch [1], which I 
still need you to review.
e.g. for map(var.ptr[:N]), the proper behavior should *only* map the array 
section but NOT the base-pointer.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571195.html


There is also the OpenMP 5.1 [352:17-22] case which basically says that the
implicit mappings should be ignored if there are explicit ones on the same
construct (though, do we really create implicit clauses in that case?).


Implicit clauses do not appear to be created if there's an explicit clause 
already existing.


+#define GOMP_MAP_IMPLICIT  (GOMP_MAP_FLAG_SPECIAL_3 \
+| GOMP_MAP_FLAG_SPECIAL_4)
+/* Mask for entire set of special map kind bits.  */
+#define GOMP_MAP_FLAG_SPECIAL_BITS (GOMP_MAP_FLAG_SPECIAL_0 \
+| GOMP_MAP_FLAG_SPECIAL_1 \
+| GOMP_MAP_FLAG_SPECIAL_2 \
+| GOMP_MAP_FLAG_SPECIAL_3 \
+| GOMP_MAP_FLAG_SPECIAL_4)

...

+#define GOMP_MAP_IMPLICIT_P(X) \
+  (((X) & GOMP_MAP_FLAG_SPECIAL_BITS) == GOMP_MAP_IMPLICIT)


I think here we need to decide with which GOMP_MAP* kinds the implicit
bit will need to be combined wit

[PATCH, PR90030] Fortran OpenMP/OpenACC array mapping alignment fix

2021-11-04 Thread Chung-Lin Tang

Hi Jakub,
As Thomas reported and submitted a patch a while ago:
https://gcc.gnu.org/pipermail/gcc-patches/2019-April/519932.html
https://gcc.gnu.org/pipermail/gcc-patches/2019-May/522738.html

There's an issue with the Fortran front-end when mapping arrays: when
creating the data MEM_REF for the map clause, there's a convention of
casting the referencing pointer to 'c_char *' by
fold_convert (build_pointer_type (char_type_node), ptr).

This causes the alignment passed to the libgomp runtime for array data
hardwared to '1', and causes alignment errors on the offload target
(not always showing up, but can trigger due to slight change of clause
ordering)

This patch is not exactly Thomas' patch from 2019, but does the same
thing. The new libgomp tests are directly reused though. A lot of
scan test adjustment is also included in this patch.

Patch has been tested for no regressions for gfortran and libgomp, is
this okay for trunk?

Thanks,
Chung-Lin

Fortran: fix array alignment for OpenMP/OpenACC target mapping clauses [PR90030]

The Fortran front-end is creating maps of array data with a type of pointer to
char_type_node, which when eventually passed to libgomp during runtime, marks
the passed array with an alignment of 1, which can cause mapping alignment
errors on the offload target.

This patch removes the related fold_convert(build_pointer_type (char_type_node))
calls in fortran/trans-openmp.c, and adds gcc_asserts to ensure pointer type.

2021-11-04  Chung-Lin Tang  
Thomas Schwinge 

PR fortran/90030

gcc/fortran/ChangeLog:

* trans-openmp.c (gfc_omp_finish_clause): Remove fold_convert to pointer
to char_type_node, add gcc_assert of POINTER_TYPE_P.
(gfc_trans_omp_array_section): Likewise.
(gfc_trans_omp_clauses): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/goacc/finalize-1.f: Adjust scan test.
* gfortran.dg/gomp/affinity-clause-1.f90: Likewise.
* gfortran.dg/gomp/affinity-clause-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
* gfortran.dg/gomp/map-3.f90: Likewise.
* gfortran.dg/gomp/pr78260-2.f90: Likewise.
* gfortran.dg/gomp/pr78260-3.f90: Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-fortran/pr90030.f90: New test.
* testsuite/libgomp.fortran/pr90030.f90: New test.diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index e81c558..0ff90b7 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1564,7 +1564,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   if (present)
ptr = gfc_build_cond_assign_expr (, present, ptr,
  null_pointer_node);
-  ptr = fold_convert (build_pointer_type (char_type_node), ptr);
+  gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr)));
   ptr = build_fold_indirect_ref (ptr);
   OMP_CLAUSE_DECL (c) = ptr;
   c2 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
@@ -2381,7 +2381,7 @@ gfc_trans_omp_array_section (stmtblock_t *block, 
gfc_omp_namelist *n,
OMP_CLAUSE_SIZE (node), elemsz);
 }
   gcc_assert (se.post.head == NULL_TREE);
-  ptr = fold_convert (build_pointer_type (char_type_node), ptr);
+  gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr)));
   OMP_CLAUSE_DECL (node) = build_fold_indirect_ref (ptr);
   ptr = fold_convert (ptrdiff_type_node, ptr);
 
@@ -2849,8 +2849,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
{
  decl = gfc_conv_descriptor_data_get (decl);
- decl = fold_convert (build_pointer_type (char_type_node),
-  decl);
+ gcc_assert (POINTER_TYPE_P (TREE_TYPE (decl)));
  decl = build_fold_indirect_ref (decl);
}
  else if (DECL_P (decl))
@@ -2873,8 +2872,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
}
  gfc_add_block_to_block (_block, );
  gfc_add_block_to_block (_block, );
- ptr = fold_convert (build_pointer_type (char_type_node),
- ptr);
+ gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr)));
  OMP_CLAUSE_DECL (node) = build_fold_indirect_ref (ptr);
}
  if (list == OMP_LIST_DEPEND)
@@ -3117,8 +3115,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  if (present)
ptr = gfc_build_cond_assign_expr (block, present, ptr,
  null_pointer_node);
- ptr

Re: [PATCH, v2, OpenMP 5.2, Fortran] Strictly-structured block support for OpenMP directives

2021-10-21 Thread Chung-Lin Tang



On 2021/10/21 12:15 AM, Jakub Jelinek wrote:

+program main
+  integer :: x, i, n
+
+  !$omp parallel
+  block
+x = x + 1
+  end block

I'd prefer not to use those x = j or x = x + 1 etc.
as statements that do random work here whenever possible.
While those are dg-do compile testcases, especially if
it is without dg-errors I think it is preferrable not to show
bad coding examples.
E.g. the x = x + 1 above is wrong for 2 reasons, x is uninitialized
before the parallel, and there is a data race, the threads, teams etc.
can write to x concurrently.
I think better would be to use something like
 call do_work
which doesn't have to be defined anywhere and will just stand there
as a black box for unspecified work.


+  !$omp workshare
+  block
+x = x + 1
+  end block

There are exceptions though, e.g. workshare is such a case, because
e.g. call do_work is not valid in workshare.
So, it is ok to keep using x = x + 1 here if you initialize it
first at the start of the program.


+  !$omp workshare
+  block
+x = 1
+!$omp critical
+block
+  x = 3
+end block
+  end block

And then there are cases like the above, please
just use different variables there (all initialized) or
say an array and access different elements in the different spots.

Jakub



Thanks, attached is what I finally committed.

Chung-Lin



From 2e4659199e814b7ee0f6bd925fd2c0a7610da856 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 21 Oct 2021 14:56:20 +0800
Subject: [PATCH] openmp: Fortran strictly-structured blocks support

This implements strictly-structured blocks support for Fortran, as specified in
OpenMP 5.2. This now allows using a Fortran BLOCK construct as the body of most
OpenMP constructs, with a "!$omp end ..." ending directive optional for that
form.

gcc/fortran/ChangeLog:

* decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case
together with COMP_BLOCK.
* parse.c (parse_omp_structured_block): Change return type to
'gfc_statement', add handling for strictly-structured block case, adjust
recursive calls to parse_omp_structured_block.
(parse_executable): Adjust calls to parse_omp_structured_block.
* parse.h (enum gfc_compile_state): Add
COMP_OMP_STRICTLY_STRUCTURED_BLOCK.
* trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case
handling.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/cancel-1.f90: Adjust testcase.
* gfortran.dg/gomp/nesting-3.f90: Adjust testcase.
* gfortran.dg/gomp/strictly-structured-block-1.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-2.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-3.f90: New test.

libgomp/ChangeLog:

* libgomp.texi (Support of strictly structured blocks in Fortran):
Adjust to 'Y'.
* testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase.
---
 gcc/fortran/decl.c|   1 +
 gcc/fortran/parse.c   |  69 +-
 gcc/fortran/parse.h   |   2 +-
 gcc/fortran/trans-openmp.c|   6 +-
 gcc/testsuite/gfortran.dg/gomp/cancel-1.f90   |   3 +
 gcc/testsuite/gfortran.dg/gomp/nesting-3.f90  |  20 +-
 .../gomp/strictly-structured-block-1.f90  | 214 ++
 .../gomp/strictly-structured-block-2.f90  | 139 
 .../gomp/strictly-structured-block-3.f90  |  52 +
 libgomp/libgomp.texi  |   2 +-
 .../libgomp.fortran/task-reduction-16.f90 |   1 +
 11 files changed, 484 insertions(+), 25 deletions(-)
 create mode 100644 
gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90
 create mode 100644 
gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-2.f90
 create mode 100644 
gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-3.f90

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 6784b07ae9e..6043e100fbb 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -8429,6 +8429,7 @@ gfc_match_end (gfc_statement *st)
   break;
 
 case COMP_BLOCK:
+case COMP_OMP_STRICTLY_STRUCTURED_BLOCK:
   *st = ST_END_BLOCK;
   target = " block";
   eos_ok = 0;
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 2a454be79b0..b1e73ee6801 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -5459,7 +5459,7 @@ parse_oacc_loop (gfc_statement acc_st)
 
 /* Parse the statements of an OpenMP structured block.  */
 
-static void
+static gfc_statement
 parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
 {
   gfc_statement st, omp_end_st;
@@ -5546,6 +5546,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
   gcc_unreachable ();
 }
 
+  bool block_construct = false;
+  gfc_namespace *my_ns = NULL;
+  gfc_namespace *my_parent = NULL;
+
+  st = next_statement ();
+
+  if (st == ST_BLOCK)
+  

[PATCH, v2, OpenMP 5.2, Fortran] Strictly-structured block support for OpenMP directives

2021-10-20 Thread Chung-Lin Tang

Hi Jakub,
this version adjusts the patch to let sections/parallel sections also use
strictly-structured blocks, making it more towards 5.2.

Because of this change, some of the testcases using the sections-construct need
a bit of adjustment too, since "block; end block" at the start of the construct
now means something different than before.

There are now three new testcases, with the non-dg-error/dg-error cases 
separated,
and a third testcase containing a few cases listed in prior emails. I hope this 
is
enough.

The implementation status entry in libgomp/libgomp.texi for strictly-structured 
blocks
has also been changed to "Y" in this patch.

Tested without regressions, is this now okay for trunk?

Thanks,
Chung-Lin

2021-10-20  Chung-Lin Tang  

gcc/fortran/ChangeLog:

* decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case
together with COMP_BLOCK.
* parse.c (parse_omp_structured_block): Change return type to
'gfc_statement', add handling for strictly-structured block case, adjust
recursive calls to parse_omp_structured_block.
(parse_executable): Adjust calls to parse_omp_structured_block.
* parse.h (enum gfc_compile_state): Add
COMP_OMP_STRICTLY_STRUCTURED_BLOCK.
* trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case
handling.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/cancel-1.f90: Adjust testcase.
* gfortran.dg/gomp/nesting-3.f90: Adjust testcase.
* gfortran.dg/gomp/strictly-structured-block-1.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-2.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-3.f90: New test.

libgomp/ChangeLog:

* libgomp.texi (Support of strictly structured blocks in Fortran):
Adjust to 'Y'.
* testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase.
diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index d6a22d13451..66489da12be 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -8449,6 +8449,7 @@ gfc_match_end (gfc_statement *st)
   break;
 
 case COMP_BLOCK:
+case COMP_OMP_STRICTLY_STRUCTURED_BLOCK:
   *st = ST_END_BLOCK;
   target = " block";
   eos_ok = 0;
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 7d765a0866d..2fb98844356 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -5451,7 +5451,7 @@ parse_oacc_loop (gfc_statement acc_st)
 
 /* Parse the statements of an OpenMP structured block.  */
 
-static void
+static gfc_statement
 parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
 {
   gfc_statement st, omp_end_st;
@@ -5538,6 +5538,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
   gcc_unreachable ();
 }
 
+  bool block_construct = false;
+  gfc_namespace *my_ns = NULL;
+  gfc_namespace *my_parent = NULL;
+
+  st = next_statement ();
+
+  if (st == ST_BLOCK)
+{
+  /* Adjust state to a strictly-structured block, now that we found that
+the body starts with a BLOCK construct.  */
+  s.state = COMP_OMP_STRICTLY_STRUCTURED_BLOCK;
+
+  block_construct = true;
+  gfc_notify_std (GFC_STD_F2008, "BLOCK construct at %C");
+
+  my_ns = gfc_build_block_ns (gfc_current_ns);
+  gfc_current_ns = my_ns;
+  my_parent = my_ns->parent;
+
+  new_st.op = EXEC_BLOCK;
+  new_st.ext.block.ns = my_ns;
+  new_st.ext.block.assoc = NULL;
+  accept_statement (ST_BLOCK);
+  st = parse_spec (ST_NONE);
+}
+
   do
 {
   if (workshare_stmts_only)
@@ -5554,7 +5580,6 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
 restrictions apply recursively.  */
  bool cycle = true;
 
- st = next_statement ();
  for (;;)
{
  switch (st)
@@ -5580,13 +5605,13 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
case ST_OMP_PARALLEL_MASKED:
case ST_OMP_PARALLEL_MASTER:
case ST_OMP_PARALLEL_SECTIONS:
- parse_omp_structured_block (st, false);
- break;
+ st = parse_omp_structured_block (st, false);
+ continue;
 
case ST_OMP_PARALLEL_WORKSHARE:
case ST_OMP_CRITICAL:
- parse_omp_structured_block (st, true);
- break;
+ st = parse_omp_structured_block (st, true);
+ continue;
 
case ST_OMP_PARALLEL_DO:
case ST_OMP_PARALLEL_DO_SIMD:
@@ -5609,7 +5634,7 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
}
}
   else
-   st = parse_executable (ST_NONE);
+   st = parse_executable (st);
   if (st == ST_NONE)
unexpected_eof ();
   else if (st == ST_OMP_SECTION
@@ -56

[PATCH, v2, OpenMP, Fortran] Support in_reduction for Fortran

2021-10-19 Thread Chung-Lin Tang
 dg-do run }
+
+subroutine foo (x, y)

...

+  if (x .ne. 11) stop 1
+  if (y .ne. 21) stop 2
+
+end program main


Again, something that can be dealt incrementally, but the
testsuite coverage of
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573600.html
was larger than this.  Would be nice e.g. to cover both scalar vars
and array sections/arrays, parameters passed by reference as in the
above testcase, but also something that isn't a reference (either a local
variable or dummy parameter with VALUE, etc.

Jakub


I have expanded target-in-reduction-1.f90 to cover local variables and
VALUE passed parameters. Array sections in reductions appear to be still
not supported by the Fortran FE in general (Tobias plans to work on that later).

I also added another target-in-reduction-2.f90 testcase that tests the 
"orphaned"
case in Fortran, where the task/target-in_reduction is in another separate 
subroutine.

Tested without regressions on trunk, is this okay to commit?

Thanks,
Chung-Lin

2021-10-19  Chung-Lin Tang  

gcc/fortran/ChangeLog:

* openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default
false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case.
(gfc_match_omp_clauses): Add 'openmp_target' default false parameter,
adjust call to gfc_match_omp_clause_reduction.
(match_omp): Adjust call to gfc_match_omp_clauses
* trans-openmp.c (gfc_trans_omp_taskgroup): Add call to
gfc_match_omp_clause, create and return block.

gcc/ChangeLog:

* omp-low.c (omp_copy_decl_2): For !ctx, use record_vars to add new copy
as local variable.
(scan_sharing_clauses): Place copy of OMP_CLAUSE_IN_REDUCTION decl in
ctx->outer instead of ctx.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan
pattern.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/target-in-reduction-1.f90: New test.
* testsuite/libgomp.fortran/target-in-reduction-2.f90: New test.diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 6a4ca2868f8..210fb06dbec 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1138,7 +1138,7 @@ failed:
 
 static match
 gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc,
-   bool allow_derived)
+   bool allow_derived, bool openmp_target = false)
 {
   if (pc == 'r' && gfc_match ("reduction ( ") != MATCH_YES)
 return MATCH_NO;
@@ -1285,6 +1285,19 @@ gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses 
*c, bool openacc,
n->u2.udr = gfc_get_omp_namelist_udr ();
n->u2.udr->udr = udr;
  }
+   if (openmp_target && list_idx == OMP_LIST_IN_REDUCTION)
+ {
+   gfc_omp_namelist *p = gfc_get_omp_namelist (), **tl;
+   p->sym = n->sym;
+   p->where = p->where;
+   p->u.map_op = OMP_MAP_ALWAYS_TOFROM;
+
+   tl = >lists[OMP_LIST_MAP];
+   while (*tl)
+ tl = &((*tl)->next);
+   *tl = p;
+   p->next = NULL;
+ }
  }
   return MATCH_YES;
 }
@@ -1353,7 +1366,7 @@ gfc_match_dupl_atomic (bool not_dupl, const char *name)
 static match
 gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
   bool first = true, bool needs_space = true,
-  bool openacc = false)
+  bool openacc = false, bool openmp_target = false)
 {
   bool error = false;
   gfc_omp_clauses *c = gfc_get_omp_clauses ();
@@ -2057,8 +2070,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
omp_mask mask,
  goto error;
}
  if ((mask & OMP_CLAUSE_IN_REDUCTION)
- && gfc_match_omp_clause_reduction (pc, c, openacc,
-allow_derived) == MATCH_YES)
+ && gfc_match_omp_clause_reduction (pc, c, openacc, allow_derived,
+openmp_target) == MATCH_YES)
continue;
  if ((mask & OMP_CLAUSE_INBRANCH)
  && (m = gfc_match_dupl_check (!c->inbranch && !c->notinbranch,
@@ -3512,7 +3525,8 @@ static match
 match_omp (gfc_exec_op op, const omp_mask mask)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (, mask) != MATCH_YES)
+  if (gfc_match_omp_clauses (, mask, true, true, false,
+op == EXEC_OMP_TARGET) != MATCH_YES)
 return MATCH_ERROR;
   new_st.op = op;
   new_st.ext.omp_clauses = c;
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index d234d1b070f..56efe195257 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -6405,12 +6405,17 @@ gfc_trans_omp_task (gfc_code *code)
 static tree
 gfc_trans_omp_taskgroup (gfc

Re: [PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives

2021-10-15 Thread Chung-Lin Tang

On 2021/10/14 7:19 PM, Jakub Jelinek wrote:

On Thu, Oct 14, 2021 at 12:20:51PM +0200, Jakub Jelinek via Gcc-patches wrote:

Thinking more about the Fortran case for !$omp sections, there is an
ambiguity.
!$omp sections
block
   !$omp section
end block
is clear and !$omp end sections is optional, but
!$omp sections
block
end block
is ambiguous during parsing, it could be either followed by !$omp section
and then the BLOCK would be first section, or by !$omp end sections and then
it would be clearly the whole sections, with first section being empty
inside of the block, or if it is followed by something else, it is
ambiguous whether the block ... end block is part of the first section,
followed by something and then we should be looking later for either
!$omp section or !$omp end section to prove that, or if
!$omp sections
block
end block
was the whole sections construct and we shouldn't await anything further.
I'm afraid back to the drawing board.


And I have to correct myself, there is no ambiguity in 5.2 here,
the important fact is hidden in sections/parallel sections being
block-associated constructs.  That means the body of the whole construct
has to be a structured-block, and by the 5.1+ definition of Fortran
structured block, it is either block ... end block or something that
doesn't start with block.
So,
!$omp sections
block
end block
a = 1
is only ambiguous in whether it is actually
!$omp sections
block
   !$omp section
end block
a = 1
or
!$omp sections
!$omp section
block
end block
!$omp end sections
a = 1
but both actually do the same thing, work roughly as !$omp single.
If one wants block statement as first in structured-block-sequence
of the first section, followed by either some further statements
or by other sections, then one needs to write
!$omp sections
!$omp section
block
end block
a = 1
...
!$omp end sections
or
!$omp sections
block
   block
   end block
   a = 1
...
end block

Your patch probably already handles it that way, but we again need
testsuite coverage to prove it is handled the way it should in all these
cases (and that we diagnose what is invalid).


The patch currently does not allow strictly-structured BLOCK for 
sections/parallel sections,
since I was referencing the 5.1 spec while writing it, although that is 
trivially fixable.
(was sensing a bit odd why those two constructs had to be specially treated in 
5.1 anyways)

The bigger issue is that under the current way the patch is written, the 
statements inside
a [parallel] sections construct are parsed automatically by parse_executable(), 
so to enforce
the specified meaning of "structured-block-sequence" (i.e. BLOCK or non-BLOCK 
starting sequence of stmts)
will probably be more a bit harder to implement:

!$omp sections
block
   !$omp section
   block
 x=0
   end block
   x=1   !! This is allowed now, though should be wrong spec-wise
   !$omp section
   x=2
end block

Currently "$!omp section" acts essentially as a top-level separator within a 
sections-construct,
rather than a structured directive. Though I would kind of argue this is 
actually better to use for the
user (why prohibit what looks like very apparent meaning of the program?)

So Jakub, my question for this is, is this current state okay? Or must we 
implement the spec pedantically?

As for the other issues:
(1) BLOCK/END BLOCK is not generally handled in parse_omp_structured_block, so 
for workshare,
it is only handled for the top-level construct, not within workshare. I 
think this is what you meant
in the last mail.

(2) As for the dangling-!$omp_end issue Tobias raised, because we are basically 
using 1-statement lookahead,
any "!$omp end <*>" is naturally bound with the adjacent BLOCK/END BLOCK, 
so we should be okay there.

Thanks,
Chung-Lin


[PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives

2021-10-07 Thread Chung-Lin Tang

Hi all,
this patch add support for "strictly-structured blocks" introduced in OpenMP 
5.1,
basically allowing BLOCK constructs to serve as the body for directives:

!$omp target
block
  ...
end block
[!$omp end target]  !! end directive is optional

!$omp parallel
block
  ...
end block
...
!$omp end parallel  !! error, considered as not match to above parallel 
directive

The parsing loop in parse_omp_structured_block() has been modified to allow
a BLOCK construct after the first statement has been detected to be ST_BLOCK.
This is done by a hard modification of the state into (the new) 
COMP_OMP_STRICTLY_STRUCTURED_BLOCK
after the statement is known (I'm not sure if there's a way to 'peek' the next
statement/token in the Fortran FE, open to suggestions on how to better write 
this)

Tested with no regressions on trunk, is this okay to commit?

Thanks,
Chung-Lin

2021-10-07  Chung-Lin Tang  

gcc/fortran/ChangeLog:

* decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case
together with COMP_BLOCK.
* parse.c (parse_omp_structured_block): Adjust declaration, add
'bool strictly_structured_block' default true parameter, add handling
for strictly-structured block case, adjust recursive calls to
parse_omp_structured_block.
(parse_executable): Adjust calls to parse_omp_structured_block.
* parse.h (enum gfc_compile_state): Add
COMP_OMP_STRICTLY_STRUCTURED_BLOCK.
* trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case
handling.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/strictly-structured-block-1.f90: New test.
diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index b3c65b7175b..ff66d1f9475 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -8445,6 +8445,7 @@ gfc_match_end (gfc_statement *st)
   break;
 
 case COMP_BLOCK:
+case COMP_OMP_STRICTLY_STRUCTURED_BLOCK:
   *st = ST_END_BLOCK;
   target = " block";
   eos_ok = 0;
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 7d765a0866d..d78bf9b8fa5 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -5451,8 +5451,9 @@ parse_oacc_loop (gfc_statement acc_st)
 
 /* Parse the statements of an OpenMP structured block.  */
 
-static void
-parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
+static gfc_statement
+parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only,
+   bool strictly_structured_block = true)
 {
   gfc_statement st, omp_end_st;
   gfc_code *cp, *np;
@@ -5538,6 +5539,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
   gcc_unreachable ();
 }
 
+  bool block_construct = false;
+  gfc_namespace* my_ns = NULL;
+  gfc_namespace* my_parent = NULL;
+
+  st = next_statement ();
+
+  if (strictly_structured_block && st == ST_BLOCK)
+{
+  /* Adjust state to a strictly-structured block, now that we found that
+the body starts with a BLOCK construct.  */
+  s.state = COMP_OMP_STRICTLY_STRUCTURED_BLOCK;
+
+  block_construct = true;
+  gfc_notify_std (GFC_STD_F2008, "BLOCK construct at %C");
+
+  my_ns = gfc_build_block_ns (gfc_current_ns);
+  gfc_current_ns = my_ns;
+  my_parent = my_ns->parent;
+
+  new_st.op = EXEC_BLOCK;
+  new_st.ext.block.ns = my_ns;
+  new_st.ext.block.assoc = NULL;
+  accept_statement (ST_BLOCK);
+  st = parse_spec (ST_NONE);
+}
+
   do
 {
   if (workshare_stmts_only)
@@ -5554,7 +5581,6 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
 restrictions apply recursively.  */
  bool cycle = true;
 
- st = next_statement ();
  for (;;)
{
  switch (st)
@@ -5576,17 +5602,20 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
  parse_forall_block ();
  break;
 
+   case ST_OMP_PARALLEL_SECTIONS:
+ st = parse_omp_structured_block (st, false, false);
+ continue;
+
case ST_OMP_PARALLEL:
case ST_OMP_PARALLEL_MASKED:
case ST_OMP_PARALLEL_MASTER:
-   case ST_OMP_PARALLEL_SECTIONS:
- parse_omp_structured_block (st, false);
- break;
+ st = parse_omp_structured_block (st, false);
+ continue;
 
case ST_OMP_PARALLEL_WORKSHARE:
case ST_OMP_CRITICAL:
- parse_omp_structured_block (st, true);
- break;
+ st = parse_omp_structured_block (st, true);
+ continue;
 
case ST_OMP_PARALLEL_DO:
case ST_OMP_PARALLEL_DO_SIMD:
@@ -5609,7 +5638,7 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
 

[PATCH, OpenMP, Fortran] Support in_reduction for Fortran

2021-09-17 Thread Chung-Lin Tang

Hi Jakub, and Fortran folks,
this patch does the required adjustments to let 'in_reduction' work for Fortran.
Not just for the target directive actually, task directive is also working after
this patch.

There is a little bit of adjustment in omp-low.c:scan_sharing_clauses:
RTL expand of the copy of the OMP_CLAUSE_IN_REDUCTION decl was failing
for Fortran by-reference arguments, which seems to work after placing them
under the outer ctx (when it exists). This also now needs checking the field_map
for existence of the field before inserting.

Tested without regressions on mainline trunk, is this okay?

(testing for devel/omp/gcc-11 is in progress)

Thanks,
Chung-Lin

2021-09-17  Chung-Lin Tang  

gcc/fortran/ChangeLog:

* openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default
false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case.
(gfc_match_omp_clauses): Add 'openmp_target' default false parameter,
adjust call to gfc_match_omp_clause_reduction.
(match_omp): Adjust call to gfc_match_omp_clauses
* trans-openmp.c (gfc_trans_omp_taskgroup): Add call to
gfc_match_omp_clause, create and return block.

gcc/ChangeLog:

* omp-low.c (scan_sharing_clauses): Place in_reduction copy of variable
in outer ctx if if exists. Check if non-existent in field_map before
installing OMP_CLAUSE_IN_REDUCTION decl.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan
pattern.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/target-in-reduction-1.f90: New test.
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index a64b7f5aa10..8179b5aa8bc 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1138,7 +1138,7 @@ failed:
 
 static match
 gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc,
-   bool allow_derived)
+   bool allow_derived, bool openmp_target = false)
 {
   if (pc == 'r' && gfc_match ("reduction ( ") != MATCH_YES)
 return MATCH_NO;
@@ -1285,6 +1285,19 @@ gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses 
*c, bool openacc,
n->u2.udr = gfc_get_omp_namelist_udr ();
n->u2.udr->udr = udr;
  }
+   if (openmp_target && list_idx == OMP_LIST_IN_REDUCTION)
+ {
+   gfc_omp_namelist *p = gfc_get_omp_namelist (), **tl;
+   p->sym = n->sym;
+   p->where = p->where;
+   p->u.map_op = OMP_MAP_ALWAYS_TOFROM;
+
+   tl = >lists[OMP_LIST_MAP];
+   while (*tl)
+ tl = &((*tl)->next);
+   *tl = p;
+   p->next = NULL;
+ }
  }
   return MATCH_YES;
 }
@@ -1353,7 +1366,7 @@ gfc_match_dupl_atomic (bool not_dupl, const char *name)
 static match
 gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
   bool first = true, bool needs_space = true,
-  bool openacc = false)
+  bool openacc = false, bool openmp_target = false)
 {
   bool error = false;
   gfc_omp_clauses *c = gfc_get_omp_clauses ();
@@ -2057,8 +2070,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
omp_mask mask,
  goto error;
}
  if ((mask & OMP_CLAUSE_IN_REDUCTION)
- && gfc_match_omp_clause_reduction (pc, c, openacc,
-allow_derived) == MATCH_YES)
+ && gfc_match_omp_clause_reduction (pc, c, openacc, allow_derived,
+openmp_target) == MATCH_YES)
continue;
  if ((mask & OMP_CLAUSE_INBRANCH)
  && (m = gfc_match_dupl_check (!c->inbranch && !c->notinbranch,
@@ -3496,7 +3509,8 @@ static match
 match_omp (gfc_exec_op op, const omp_mask mask)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (, mask) != MATCH_YES)
+  if (gfc_match_omp_clauses (, mask, true, true, false,
+(op == EXEC_OMP_TARGET)) != MATCH_YES)
 return MATCH_ERROR;
   new_st.op = op;
   new_st.ext.omp_clauses = c;
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index e55e0c81868..08483951066 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -6391,12 +6391,17 @@ gfc_trans_omp_task (gfc_code *code)
 static tree
 gfc_trans_omp_taskgroup (gfc_code *code)
 {
+  stmtblock_t block;
+  gfc_start_block ();
   tree body = gfc_trans_code (code->block->next);
   tree stmt = make_node (OMP_TASKGROUP);
   TREE_TYPE (stmt) = void_type_node;
   OMP_TASKGROUP_BODY (stmt) = body;
-  OMP_TASKGROUP_CLAUSES (stmt) = NULL_TREE;
-  return stmt;
+  OMP_TASKGROUP_CLAUSES (stmt) = gfc_trans_omp_clauses (,
+   code->ext.omp_cla

[PATCH, OG11, OpenACC, committed] Fix ICE for non-contiguous arrays

2021-08-19 Thread Chung-Lin Tang

Currently we ICE when non-decl base-pointers (like struct members) are
used in OpenACC non-contiguous array sections.

This patch is kind of a band-aid to reject such cases ATM. We'll deal
with the more elaborate middle-end stuff to fully support them later.

Committed to devel/omp/gcc-11 after testing. This is not for mainline.

Chung-Lin

From 4e34710679ac084d7ca15ccf387c1b6f1e64c2d1 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 19 Aug 2021 16:17:02 +0800
Subject: [PATCH] openacc: fix ICE for non-decl expression in non-contiguous
 array base-pointer

Currently, we do not support cases like struct-members as the base-pointer
for an OpenACC non-contiguous array. Mark such cases as unsupported in the
C/C++ front-ends, instead of ICEing on them.

gcc/c/ChangeLog:

* c-typeck.c (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.

gcc/cp/ChangeLog:

* semantics.c (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.
---
 gcc/c/c-typeck.c   | 35 +++
 gcc/cp/semantics.c | 39 ---
 2 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 9c4822bbf27..a8b54c676c0 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13431,25 +13431,36 @@ handle_omp_array_sections_1 (tree c, tree t, 
vec ,
  && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_AFFINITY
  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
{
- if (ort == C_ORT_ACC)
-   /* Note that OpenACC does accept these kinds of non-contiguous
-  pointer based arrays.  */
-   non_contiguous = true;
- else
+ /* If any prior dimension has a non-one length, then deem this
+array section as non-contiguous.  */
+ for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST;
+  d = TREE_CHAIN (d))
{
- /* If any prior dimension has a non-one length, then deem this
-array section as non-contiguous.  */
- for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST;
-  d = TREE_CHAIN (d))
+ tree d_length = TREE_VALUE (d);
+ if (d_length == NULL_TREE || !integer_onep (d_length))
{
- tree d_length = TREE_VALUE (d);
- if (d_length == NULL_TREE || !integer_onep (d_length))
+ if (ort == C_ORT_ACC)
{
+ while (TREE_CODE (d) == TREE_LIST)
+   d = TREE_CHAIN (d);
+ if (DECL_P (d))
+   {
+ /* Note that OpenACC does accept these kinds of
+non-contiguous pointer based arrays.  */
+ non_contiguous = true;
+ break;
+   }
  error_at (OMP_CLAUSE_LOCATION (c),
-   "array section is not contiguous in %qs clause",
+   "base-pointer expression in %qs clause not "
+   "supported for non-contiguous arrays",
omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
  return error_mark_node;
}
+
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "array section is not contiguous in %qs clause",
+   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+ return error_mark_node;
}
}
}
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index e56ad8aa1e1..ad62ad76ff9 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5292,32 +5292,41 @@ handle_omp_array_sections_1 (tree c, tree t, vec 
,
  return error_mark_node;
}
   /* If there is a pointer type anywhere but in the very first
-array-section-subscript, the array section could be non-contiguous.
-Note that OpenACC does accept these kinds of non-contiguous pointer
-based arrays.  */
+array-section-subscript, the array section could be non-contiguous.  */
   if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_AFFINITY
  && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
{
- if (ort == C_ORT_ACC)
-   /* Note that OpenACC does accept these kinds of non-contiguous
-  pointer based arrays.  */
-   non_contiguous = true;
- else
+ /* If any prior dimension has a non-one length, then deem this
+array section as non-contiguous.  */
+ for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST;
+  d

[PATCH, libgomp, OpenMP 5.0, OG11, committed] Implement omp_get_device_num

2021-08-09 Thread Chung-Lin Tang

The omp_get_device_num patch was merged to devel/omp/gcc-11 (OG11) after 
testing.
Commit was 83177ca9f262b230c892e667ebf685f96a718ec8.

This commit also effective reverts the one-liner patch by Cesar:
https://gcc.gnu.org/pipermail/gcc-patches/2017-October/484844.html

(which was still kept in OG11 at 59ef9fea377db72f198b2bd5a95d5aef58b3f9c4)

That small patch is not on mainline, and conflicts with the current merge, and 
upon
review and test, appears isn't really needed anymore. Thus took the liberty to
overwrite it with the merge of this omp_get_device_num patch.

Chung-Lin

From 83177ca9f262b230c892e667ebf685f96a718ec8 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Mon, 9 Aug 2021 08:58:07 +0200
Subject: [PATCH] openmp: Implement omp_get_device_num routine

This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.

GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.

libgomp/ChangeLog:

* icv-device.c (omp_get_device_num): New API function, host side.
* fortran.c (omp_get_device_num_): New interface function.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
omp_get_device_num_.
* libgomp.texi (omp_get_device_num): Add documentation for new API
function.
* omp.h.in (omp_get_device_num): Add declaration.
* omp_lib.f90.in (omp_get_device_num): Likewise.
* omp_lib.h.in (omp_get_device_num): Likewise.
* target.c (gomp_load_image_to_device): If additional entry for device
number exists at end of returned entries from 'load_image_func' hook,
copy the assigned device number over to the device variable.

* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-gcn.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.

* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-nvptx.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.

* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_intelmic): New function for
testing for intelmic offloading.
* testsuite/libgomp.c-c++-common/target-45.c: New test.
* testsuite/libgomp.fortran/target10.f90: New test.

(cherry picked from commit 0bac793ed6bad2c0c13cd1e93a1aa5808467afc8)
---
 libgomp/ChangeLog.omp  | 42 +++---
 libgomp/config/gcn/icv-device.c| 11 ++
 libgomp/config/nvptx/icv-device.c  | 11 ++
 libgomp/fortran.c  |  7 
 libgomp/icv-device.c   |  9 +
 libgomp/libgomp-plugin.h   |  6 
 libgomp/libgomp.map|  8 -
 libgomp/libgomp.texi   | 29 +++
 libgomp/omp.h.in   |  1 +
 libgomp/omp_lib.f90.in |  6 
 libgomp/omp_lib.h.in   |  3 ++
 libgomp/plugin/plugin-gcn.c| 38 ++--
 libgomp/plugin/plugin-nvptx.c  | 25 +++--
 libgomp/target.c   | 36 ++-
 libgomp/testsuite/lib/libgomp.exp  |  5 +++
 libgomp/testsuite/libgomp.c-c++-common/target-45.c | 30 
 libgomp/testsuite/libgomp.fortran/target10.f90 | 20 +++
 17 files changed, 276 insertions(+), 11 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-45.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/target10.f90

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 9467e90..3a3299b 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,15 +1,49 @@
-2021-06-30  Tobias Burnus  
+2021-08-09  Tobias Burnus  
 
Backported from master:
-   2021-06-29  Thomas Schwinge  
+   2021-08-05  Chung-Lin Tang  
+
+   * icv-device.c (omp_get_device_num): New API function, host side.
+   * fortran.c (omp_get_device_num_): New interface function.
+   * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
+   * libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
+   omp_get_devic

[PATCH, v3, libgomp, OpenMP 5.0, committed] Implement omp_get_device_num

2021-08-05 Thread Chung-Lin Tang
 a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -70,6 +70,16 @@ omp_is_initial_device (void)
return 0;
  }
  
+/* This is set to the device number of current GPU during device initialization,

+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
  ialias (omp_set_default_device)
  ialias (omp_get_default_device)
  ialias (omp_get_initial_device)

I suppose also add 'ialias (omp_get_device_num)' here, like...


Done, thanks for catching.


--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
+# Return 1 if compiling for offload target intelmic
+proc check_effective_target_offload_target_intelmic { } {
+return [libgomp_check_effective_target_offload_target "*-intelmic"]
+}
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { ! offload_target_intelmic } } } */

This means that the test case is skipped as soon as the compiler is
configured for Intel MIC offloading -- even if that's not used during
execution.

 From some older experiment of mine, I do have a
'check_effective_target_offload_device_intel_mic', which I'll propose as
a follow-up, once this is in.


Great.


+  if (initial_device .and. host_device_num .ne. device_num) stop 2

That one matches 'libgomp.c-c++-common/target-45.c':

 if (initial_device && host_device_num != device_num)
   abort ();

..., but here:


+  if (initial_device .and. host_device_num .eq. device_num) stop 3

... shouldn't that be '.not.initial_device', like in:

 if (!initial_device && host_device_num == device_num)
   abort ();


Yeah, Tobias also caught this as well :)



(Also, I'm not familiar with Fortran operator precedence rules, so
probably would put the individual expressions into braces.;-)  -- But I
trust you know better than I do, of course.)


Done.

Attached is the final "v3" patch that I committed.

Thanks,
Chung-Lin


From 0bac793ed6bad2c0c13cd1e93a1aa5808467afc8 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 5 Aug 2021 23:29:03 +0800
Subject: [PATCH] openmp: Implement omp_get_device_num routine

This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.

GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.

libgomp/ChangeLog:

* icv-device.c (omp_get_device_num): New API function, host side.
* fortran.c (omp_get_device_num_): New interface function.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
omp_get_device_num_.
* libgomp.texi (omp_get_device_num): Add documentation for new API
function.
* omp.h.in (omp_get_device_num): Add declaration.
* omp_lib.f90.in (omp_get_device_num): Likewise.
* omp_lib.h.in (omp_get_device_num): Likewise.
* target.c (gomp_load_image_to_device): If additional entry for device
number exists at end of returned entries from 'load_image_func' hook,
copy the assigned device number over to the device variable.

* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-gcn.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.

* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-nvptx.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.

* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_intelmic): New function for
testing for intelmic offloading.
* testsuite/libgomp.c-c++-common/target-45.c: New test.
* testsuite/libgomp.fortran/target10.f90: New test.
---
 libgomp/config/gcn/icv-device.c   | 11 ++
 libgomp/config/nvptx/icv-device.c | 11 ++
 libgomp/fortran.c |  7 
 libgomp/icv-device.c  |  9 +
 libgomp/libgomp-plugin.h  |  6 +++
 libgomp/libgomp.map   |  8 +++-
 libgomp/libgomp.texi  | 29 ++
 libgomp/omp.h.in  |  1 +
 libgomp/omp_lib.f90.in|  6 +++

Re: [PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num

2021-08-05 Thread Chung-Lin Tang




On 2021/8/3 8:22 PM, Thomas Schwinge wrote:

Hi Chung-Lin!

On 2021-08-02T21:10:57+0800, Chung-Lin Tang  wrote:

--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c



+int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}


Missing 'ialias_redirect (omp_get_device_num)'?


Grüße
  Thomas



Thanks, will fix before committing.

Chung-Lin


[PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num

2021-08-02 Thread Chung-Lin Tang

On 2021/7/23 6:39 PM, Jakub Jelinek wrote:

On Fri, Jul 23, 2021 at 06:21:41PM +0800, Chung-Lin Tang wrote:

--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -61,8 +61,17 @@ omp_is_initial_device (void)
return 1;
  }
  
+int

+omp_get_device_num (void)
+{
+  /* By specification, this is equivalent to omp_get_initial_device
+ on the host.  */
+  return omp_get_initial_device ();
+}
+


I think this won't work properly with the intel micoffload, where the host
libgomp is used in the offloaded code.
For omp_is_initial_device, the plugin solves it by:
liboffloadmic/plugin/offload_target_main.cpp
overriding it:
/* Override the corresponding functions from libgomp.  */
extern "C" int
omp_is_initial_device (void) __GOMP_NOTHROW
{
   return 0;
}

extern "C" int32_t

omp_is_initial_device_ (void)
{
   return omp_is_initial_device ();
}
but guess it will need slightly more work because we need to copy the value
to the offloading device too.
It can be done incrementally though.


I guess this part of intelmic functionality will just have to wait later.
There seem to be other parts of liboffloadmic that seems to need re-work,
e.g. omp_get_num_devices() return mic_engines_total, where it should actually
return the number of all devices (not just intelmic). omp_get_initial_device()
returning -1 (which I don't quite understand), etc.

Really suggest to have intelmic support be re-worked as an offload plugin inside
libgomp, rather than floating outside by itself.


--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -102,6 +102,12 @@ struct addr_pair
uintptr_t end;
  };
  
+/* This symbol is to name a target side variable that holds the designated

+   'device number' of the target device. The symbol needs to be available to
+   libgomp code and the  offload plugin (which in the latter case must be
+   stringified).  */
+#define GOMP_DEVICE_NUM_VAR __gomp_device_num


For a single var it is acceptable (though, please avoid the double space
before offload plugin in the comment), but once we have more than one
variable, I think we should simply have a struct which will contain all the
parameters that need to be copied from the host to the offloading device at
image load time (and have eventually another struct that holds parameters
that we'll need to copy to the device on each kernel launch, I bet some ICVs
will be one category, other ICVs another one).


Actually, if you look at the 5.[01] specifications, omp_get_device_num() is not
defined in terms of an ICV. Maybe it conceptually ought to be, but the current
description of "the device number of the device on which the calling thread is
executing" is not one if the defined ICVs.

It looks like there will eventually be some kind of ICV block handled in a 
similar
way, but I think that the modifications will be straightforward then. For now,
I think it's okay for GOMP_DEVICE_NUM_VAR to just be a normal global variable.


diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 8ea27b5565f..ffcb98ae99e 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -197,6 +197,8 @@ OMP_5.0.1 {
omp_get_supported_active_levels_;
omp_fulfill_event;
omp_fulfill_event_;
+   omp_get_device_num;
+   omp_get_device_num_;
  } OMP_5.0;


This is wrong.  We've already released GCC 11.1 with the OMP_5.0.1
symbol version, so we must not add any further symbols into that symbol
version.  OpenMP 5.0 routines added in GCC 12 should be OMP_5.0.2 symbol
version.


I've adjusted this into 5.0.2, in between 5.0.1 and the new 5.1 added by the 
recent
omp_display_env[_] routines. omp_get_device_num is a OpenMP 5.0 introduced
API function, so I think this is the correct handling (instead of stashing into 
5.1).

There is a new function check_effective_target_offload_target_intelmic() in
testsuite/lib/libgomp.exp, used to test for non-intelmic offloading situations.

Re-tested with no regressions, seeking approval for trunk.

Thanks,
Chung-Lin

2021-08-02  Chung-Lin Tang  

libgomp/ChangeLog

* icv-device.c (omp_get_device_num): New API function, host side.
* fortran.c (omp_get_device_num_): New interface function.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
omp_get_device_num_.
* libgomp.texi (omp_get_device_num): Add documentation for new API
function.
* omp.h.in (omp_get_device_num): Add declaration.
* omp_lib.f90.in (omp_get_device_num): Likewise.
* omp_lib.h.in (omp_get_device_num): Likewise.
* target.c (gomp_load_image_to_device): If additional entry for device
number exists at end of returned entries from 'load_image_func' hook,
copy the assigned device number over to the device variable.

* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num):

Re: [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num

2021-08-02 Thread Chung-Lin Tang




On 2021/7/23 7:01 PM, Tobias Burnus wrote:

I personally prefer having:
    int initial_dev;
and inside 'omp target' (with 'map(from:initial_dev)'):
    initial_device = omp_is_initial_device();

Then the check would be:
   if (initial_device && host_device_num != device_num)
 abort();
   if (!initial_device && host_device_num == device_num)
 abort();

(Likewise for Fortran.)


Thanks, I've adjusted the new testcases to use this style.


And instead of restricting the target to nvptx/gcn, we could just add
dg-xfail-run-if for *-intelmic-* and *-intelmicemul-*.


I've added a 'offload_target_intelmic' to use on the new testcases.


Additionally, offload_target_nvptx/...amdgcn only check whether
compilation support is available not whether a device exists
at run time.
(The device availability is checked by target_offload_device,
using omp_is_initial_device().)


I guess there is value in testing compilation as long as the compiler
is properly configured, and leaving the execution as an independent test.
OTOH, I think the OpenMP execution tests are not properly forcing offload
(or not) using the environment variables, unlike what we have for OpenACC.

Thanks,
Chung-Lin


[PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num

2021-07-23 Thread Chung-Lin Tang

Hi all,
this patch implements the omp_get_device_num API function, which appears
to be a missing piece in the library routines implementation.

The host-side implementation is simple, which by specification is equivalent
to omp_get_initial_device.

Inside offloaded regions, the preferred way to should be that the device
already has this information initialized (once) when the device is initialized.
And the function merely returns the stored value.

This implementation adds a convention for an additional entry (dubbed under 
'others'
in the code) returned by the 'load_image' plugin hook. Basically we define
a variable name in libgomp-plugin.h, which the device libgomp defines, and the
offload plugin searches for, and returns the variable device location start/end 
for
gomp_load_image_from_device to initialize. The device-side omp_get_device_num
then just returns that value.

This patch implements for gcn and nvptx offload targets. The icv-device.c file 
is
starting to look like a file ready to consolidate away the target specific 
versions,
but that's for later.

Basic libgomp tests were added for C/C++ and Fortran. Tested without regressions
with offloading for amdgcn and nvptx on x86_64-linux host. Okay for trunk?

Thanks,
Chung-Lin

2021-07-23  Chung-Lin Tang  

libgomp/ChangeLog

* icv-device.c (omp_get_device_num): New API function, host side.
* fortran.c (omp_get_device_num_): New interface function.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
* libgomp.map (OMP_5.0.1): Add omp_get_device_num, omp_get_device_num_.
* libgomp.texi (omp_get_device_num): Add documentation for new API
function.
* omp.h.in (omp_get_device_num): Add declaration.
* omp_lib.f90.in (omp_get_device_num): Likewise.
* omp_lib.h.in (omp_get_device_num): Likewise.
* target.c (gomp_load_image_to_device): If additional entry for device
number exists at end of returned entries from 'load_image_func' hook,
copy the assigned device number over to the device variable.

* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* config/plugin/plugin-gcn.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.

* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* config/plugin/plugin-nvptx.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.

* testsuite/libgomp.c-c++-common/target-45.c: New test.
* testsuite/libgomp.fortran/target10.f90: New test.


diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index 72d4f7cff74..8f72028a6c8 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -70,6 +70,16 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device 
initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
diff --git a/libgomp/config/nvptx/icv-device.c 
b/libgomp/config/nvptx/icv-device.c
index 3b96890f338..e586da1d3a8 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -58,8 +58,19 @@ omp_is_initial_device (void)
   return 0;
 }
 
+/* This is set to the device number of current GPU during device 
initialization,
+   when the offload image containing this libgomp portion is loaded.  */
+static int GOMP_DEVICE_NUM_VAR;
+
+int
+omp_get_device_num (void)
+{
+  return GOMP_DEVICE_NUM_VAR;
+}
+
 ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
 ialias (omp_is_initial_device)
+ialias (omp_get_device_num)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 4ec39c4e61b..2360582e32e 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -598,6 +598,12 @@ omp_get_initial_device_ (void)
   return omp_get_initial_device ();
 }
 
+int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}
+
 int32_t
 omp_get_max_task_priority_ (void)
 {
diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c
index c1bedf46647..f11bdfa85c4 100644
--- a/libgomp/icv-device.c
+++ b/libgomp/icv-device.c
@@ -61,8 +61,17 @@ omp_is_initial_device (void)
   return 1;
 }
 
+int
+omp_get_device_num (void)
+{
+  /* By specification, this is equivalent to omp_get_initial_device
+ on the host.  */
+  return omp_get_initial_dev

[PATCH, libgomp, PR101114, committed] Fix struct-elem-5.c regression

2021-06-25 Thread Chung-Lin Tang

The libgomp.c-c++-common/struct-elem-5.c test which I added for the Structure
element mapping patch, does not properly "fail" for non-shared (unified) address
space cases (like host-fallback).

This was handled inside the testcase for struct-elem-[14].c, but missed this
one due to the dg-shouldfail nature.

Fixed by adding "target offload_device_nonshared_as" to dg-run. This is
quite small and obvious, so directly committed after testing.

Chung-Lin

libgomp/ChangeLog:

PR testsuite/101114
* testsuite/libgomp.c-c++-common/struct-elem-5.c:
Add "target offload_device_nonshared_as" condition for enabling test.

From e0672017370b9a9362fda52ecffe33d1c9c41829 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Sat, 26 Jun 2021 00:42:58 +0800
Subject: [PATCH] testsuite/101114: Adjust libgomp.c-c++-common/struct-elem-5.c
 testcase

The dg-shouldfail testcase libgomp.c-c++-common/struct-elem-5.c does not
properly fail for non-shared address space offloading. Adjust testcase
to limit testing only for "target offload_device_nonshared_as".

libgomp/ChangeLog:

PR testsuite/101114
* testsuite/libgomp.c-c++-common/struct-elem-5.c:
Add "target offload_device_nonshared_as" condition for enabling test.
---
 libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c 
b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c
index 814c30120e5..31a2fa5e8cf 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target offload_device_nonshared_as } } */
 
 struct S
 {
-- 
2.17.1



[PATCH, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v4]

2021-06-18 Thread Chung-Lin Tang

Hi Jakub,
this patch is the "v4" version of my PR92120 patch, v3 was here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570886.html

(there I listed the various patches from devel/omp/gcc-10 branch that was 
combined,
which I won't repeat here).

Basically this v4 adds fixes for lambda capture, which was already pushed to
 devel/omp/gcc-11 yesterday:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572988.html

I have attached both the combined v4 version, and the v3-to-v4 diff.

Tested on x86_64-linux with nvptx offloading, seeking for approval to trunk.

Thanks,
Chung-Lin

gcc/cp/
* cp-tree.h (finish_omp_target): New declaration.
(finish_omp_target_clauses): Likewise.
* parser.c (cp_parser_omp_clause_map): Adjust call to
cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
(cp_parser_omp_target): Factor out code, adjust into calls to new
function finish_omp_target.
* pt.c (tsubst_expr): Add call to finish_omp_target_clauses for
OMP_TARGET case.
* semantics.c (handle_omp_array_sections_1): Add handling to create
'this->member' from 'member' FIELD_DECL.
(handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
map clauses. Handle 'A->member' case in map clauses.
(struct omp_target_walk_data): New struct for walking over
target-directive tree body.
(finish_omp_target_clauses_r): New function for tree walk.
(finish_omp_target_clauses): New function.
(finish_omp_target): New function.

gcc/c/
* c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
call to c_parser_omp_variable_list to 'true'.
* c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
array base handling.
(c_finish_omp_clauses): Handle 'A->member' case in map clauses.

gcc/
* gimplify.c ("tree-hash-traits.h"): Add include.
(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
hash_map *. Adjust struct map handling to handle
cases of *A and A->B expressions. Under !DECL_P case of
GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when
handling component_ref_p case. Add unshare_expr and gimplification
when created GOMP_MAP_STRUCT is not a DECL. Add code to add
firstprivate pointer for *pointer-to-struct case.
(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
exit data directives code to earlier position.
* omp-low.c (lower_omp_target):
Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
* tree-pretty-print.c (dump_omp_clause): Likewise.

gcc/testsuite/
* gcc.dg/gomp/target-3.c: New testcase.
* g++.dg/gomp/target-3.C: New testcase.
* g++.dg/gomp/target-lambda-1.C: New testcase.
* g++.dg/gomp/target-lambda-2.C: New testcase.
* g++.dg/gomp/target-this-1.C: New testcase.
* g++.dg/gomp/target-this-2.C: New testcase.
* g++.dg/gomp/target-this-3.C: New testcase.
* g++.dg/gomp/target-this-4.C: New testcase.
* g++.dg/gomp/target-this-5.C: New testcase.
* g++.dg/gomp/this-2.C: Adjust testcase.

include/
* gomp-constants.h (enum gomp_map_kind):
Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
(GOMP_MAP_POINTER_P):
Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.

libgomp/
* libgomp.h (gomp_attach_pointer): Add bool parameter.
* oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_existing): Update assert condition to
include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
(gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for mapping a pointer with NULL target.
(gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for attaching a pointer with NULL target.
(gomp_map_vars_internal): Update calls to gomp_map_pointer and
gomp_attach_pointer, add handling for
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.
* testsuite/libgomp.c++/target-23.C: New testcase.
* testsuite/libgomp.c++/target-lambda-1.C: New testcase.
* testsuite/libgomp.c++/target-lambda-2.C: New testcase.
* testsuite/libgomp.c++/target-this-1.C: New testcase.
* testsuite/libgomp.c++/target-this-2.C: New testcase.
* testsuite/libgomp.c++/target-this-3.C: New 

[PATCH, C++, OpenMP 5.0, OG11] Fixes for lambda in offload regions

2021-06-17 Thread Chung-Lin Tang

This patch contains:

(1) Some fixes for lambda capture by-reference to work inside
offload regions.

(2) Cases where lambda objects declared inside an offload region
were mistakenly target-mapped on the enclosing target construct,
causing a gimplify ICE (because it isn't binded at that position),
added checks to avoid this.

Added another testcase to test if lambda works in these cases.
Tested without regressions on devel/omp/gcc-11, pushed there.

Jakub, this technically is a further bug fix for the PR92120 v3 patch.
I'll submit a v4 for mainline trunk later, or this patch independently
in case the v3 patch is already reviewed by then.

Thanks,
Chung-Lin

gcc/cp/ChangeLog:

* semantics.c (struct omp_target_walk_data):
Add 'hash_set local_decls' member.
(finish_omp_target_clauses_r): Handle BIND_EXPR case, fill in
local_decls there. Adjust case to not add locally declared lambda
objects to data->lambda_objects_accessed.
(finish_omp_target_clauses): Peel away TARGET_EXPR for lambda objects.
Adjust map kind to _TOFROM for reference fields in closures.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/target-lambda-2.C: New test.

libgomp/ChangeLog:

* testsuite/libgomp.c++/target-lambda-2.C: New test.
From dbf5d72f4c077215330e5b06fbb9b3311b807c2a Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 17 Jun 2021 21:53:10 +0800
Subject: [PATCH] Fixes for lambda in offload regions

This patch contains:

(1) Some fixes for lambda capture by-reference to work inside
offload regions.

(2) Cases where lambda objects declared inside an offload region
were mistakenly target-mapped on the enclosing target construct,
causing a gimplify ICE (because it isn't binded at that position),
added checks to avoid this.

gcc/cp/ChangeLog:

* semantics.c (struct omp_target_walk_data):
Add 'hash_set local_decls' member.
(finish_omp_target_clauses_r): Handle BIND_EXPR case, fill in
local_decls there. Adjust case to not add locally declared lambda
objects to data->lambda_objects_accessed.
(finish_omp_target_clauses): Peel away TARGET_EXPR for lambda objects.
Adjust map kind to _TOFROM for reference fields in closures.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/target-lambda-2.C: New test.

libgomp/ChangeLog:

* testsuite/libgomp.c++/target-lambda-2.C: New test.
---
 gcc/cp/semantics.c| 22 ++--
 gcc/testsuite/g++.dg/gomp/target-lambda-2.C   | 35 +++
 .../testsuite/libgomp.c++/target-lambda-2.C   | 30 
 3 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/target-lambda-2.C
 create mode 100644 libgomp/testsuite/libgomp.c++/target-lambda-2.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 25fa6cb5305..1f7eacfe701 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -9145,6 +9145,8 @@ struct omp_target_walk_data
 
   tree current_closure;
   hash_set closure_vars_accessed;
+
+  hash_set local_decls;
 };
 
 static tree
@@ -9203,12 +9205,25 @@ finish_omp_target_clauses_r (tree *tp, int 
*walk_subtrees, void *ptr)
   return NULL_TREE;
 }
 
+  if (TREE_CODE (t) == BIND_EXPR)
+{
+  tree block = BIND_EXPR_BLOCK (t);
+  for (tree var = BLOCK_VARS (block); var; var = DECL_CHAIN (var))
+   if (!data->local_decls.contains (var))
+ data->local_decls.add (var);
+  return NULL_TREE;
+}
+
   if (TREE_TYPE(t) && LAMBDA_TYPE_P (TREE_TYPE (t)))
 {
   tree lt = TREE_TYPE (t);
   gcc_assert (CLASS_TYPE_P (lt));
 
-  if (!data->lambda_objects_accessed.contains (t))
+  if (!data->lambda_objects_accessed.contains (t)
+ /* Do not prepare to create target maps for locally declared
+lambdas or anonymous ones.  */
+ && !data->local_decls.contains (t)
+ && TREE_CODE (t) != TARGET_EXPR)
data->lambda_objects_accessed.add (t);
   *walk_subtrees = 0;
   return NULL_TREE;
@@ -9494,6 +9509,9 @@ finish_omp_target_clauses (location_t loc, tree body, 
tree *clauses_ptr)
   i != data.lambda_objects_accessed.end (); ++i)
{
  tree lobj = *i;
+ if (TREE_CODE (lobj) == TARGET_EXPR)
+   lobj = TREE_OPERAND (lobj, 0);
+
  tree lt = TREE_TYPE (lobj);
  gcc_assert (LAMBDA_TYPE_P (lt) && CLASS_TYPE_P (lt));
 
@@ -9530,7 +9548,7 @@ finish_omp_target_clauses (location_t loc, tree body, 
tree *clauses_ptr)
  tree exp = build3 (COMPONENT_REF, TREE_TYPE (fld),
 lobj, fld, NULL_TREE);
  tree c = build_omp_clause (loc, OMP_CLAUSE_MAP);
- OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TO);
+ OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
  OMP_CLAUSE_DECL (c)

Re: [PATCH, v3, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2021-06-17 Thread Chung-Lin Tang



On 2021/6/10 9:04 PM, Jakub Jelinek wrote:

I know you had performance concerns in the last round, compared with your 
sorting
approach. I'll try to research on that later. Getting the v3 patch posted before
backporting to devel/omp/gcc-11.

But please have a look at this incrementally.
I think the common case is just a couple of mappings (say < 10 or < 20 in
90%+ of cases) and a htab might be too expensive for that.


Thanks, I'll do that later.


+  if (!omp_target_is_present (, d))
+abort ();
+  if (!omp_target_is_present ([0], d))
+abort ();
+  if (!omp_target_is_present ([0], d))
+abort ();
+
+  #pragma omp target exit data map (from:q[:1])
+
+  if (omp_target_is_present (, d))
+abort ();

Has this been tested with offloading not configured?
omp_target_is_present will return 1 for the initial device
for all the pointers (everything is present).
So I wonder if these 3 if (omp_target_is_present (..., d))
shouldn't be
   if (d != id && omp_target_is_present (..., d))


Yeah, you're right. Host fallback mode aborts. I've modified the testcases as 
you suggested.
Attached is the final patch I pushed.

Thanks,
Chung-Lin
From 275c736e732d29934e4d22e8f030d5aae8c12a52 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 17 Jun 2021 21:33:32 +0800
Subject: [PATCH] libgomp: Structure element mapping for OpenMP 5.0

This patch implement OpenMP 5.0 requirements of incrementing/decrementing
the reference count of a mapped structure at most once (across all elements)
on a construct.

This is implemented by pulling in libgomp/hashtab.h and using htab_t as a
pointer set. Structure element list siblings also have pointers-to-refcounts
linked together, to naturally achieve uniform increment/decrement without
repeating.

There are still some questions on whether using such a htab_t based set is
faster/slower than using a sorted pointer array based implementation. This
is to be researched on later.

libgomp/ChangeLog:

* hashtab.h (htab_clear): New function with initialization code
factored out from...
(htab_create): ...here, adjust to use htab_clear function.

* libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
special refcount values, add comments.
(REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
(REFCOUNT_LINK): Likewise.
(REFCOUNT_STRUCTELEM): New special refcount range for structure
element siblings.
(REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
sibling maps.
(REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
(REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
(REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
(REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
(struct splay_tree_key_s): Add structelem_refcount and
structelem_refcount_ptr fields into a union with dynamic_refcount.
Add comments.
(gomp_map_vars): Delete declaration.
(gomp_map_vars_async): Likewise.
(gomp_unmap_vars): Likewise.
(gomp_unmap_vars_async): Likewise.
(goacc_map_vars): New declaration.
(goacc_unmap_vars): Likewise.

* oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
(goacc_enter_datum): Likewise.
(goacc_enter_data_internal): Likewise.
* oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
and goacc_unmap_vars.
(GOACC_data_start): Adjust to use goacc_map_vars.
(GOACC_data_end): Adjust to use goacc_unmap_vars.

* target.c (hash_entry_type): New typedef.
(htab_alloc): New function hook for hashtab.h.
(htab_free): Likewise.
(htab_hash): Likewise.
(htab_eq): Likewise.
(hashtab.h): Add file include.
(gomp_increment_refcount): New function.
(gomp_decrement_refcount): Likewise.
(gomp_map_vars_existing): Add refcount_set parameter, adjust to use
gomp_increment_refcount.
(gomp_map_fields_existing): Add refcount_set parameter, adjust calls
to gomp_map_vars_existing.

(gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
variable to guard OpenMP specific paths, adjust calls to
gomp_map_vars_existing, add structure element sibling splay_tree_key
sequence creation code, adjust Fortran map case to avoid increment
under OpenMP.
(gomp_map_vars): Adjust to static, add refcount_set parameter, manage
local refcount_set if caller passed in NULL, adjust call to
gomp_map_vars_internal.
(gomp_map_vars_async): Adjust and rename into...
(goacc_map_vars): ...this new function, adjust call to
gomp_map_vars_internal.

(gomp_remove_splay_tree_key): New function with code factored out from
gomp_remove_var_internal.
(gomp_remove_var_internal): Add code to h

[PATCH, v3, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2021-05-31 Thread Chung-Lin Tang

Hi Jakub,
this is a v3 version of my OpenMP 5.0 structure element mapping patch,
v2 was here: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561139.html

This v3 adds a small bug fix, where the initialization of the refcount didn't
handle all cases, fixed by using gomp_refcount_increment here (more consistent).

I know you had performance concerns in the last round, compared with your 
sorting
approach. I'll try to research on that later. Getting the v3 patch posted before
backporting to devel/omp/gcc-11.

Thanks,
Chung-Lin

libgomp/
* hashtab.h (htab_clear): New function with initialization code
factored out from...
(htab_create): ...here, adjust to use htab_clear function.

* libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
special refcount values, add comments.
(REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
(REFCOUNT_LINK): Likewise.
(REFCOUNT_STRUCTELEM): New special refcount range for structure
element siblings.
(REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
sibling maps.
(REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
(REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
(REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
(REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
(struct splay_tree_key_s): Add structelem_refcount and
structelem_refcount_ptr fields into a union with dynamic_refcount.
Add comments.
(gomp_map_vars): Delete declaration.
(gomp_map_vars_async): Likewise.
(gomp_unmap_vars): Likewise.
(gomp_unmap_vars_async): Likewise.
(goacc_map_vars): New declaration.
(goacc_unmap_vars): Likewise.

* oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
(goacc_enter_datum): Likewise.
(goacc_enter_data_internal): Likewise.
* oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
and goacc_unmap_vars.
(GOACC_data_start): Adjust to use goacc_map_vars.
(GOACC_data_end): Adjust to use goacc_unmap_vars.

* target.c (hash_entry_type): New typedef.
(htab_alloc): New function hook for hashtab.h.
(htab_free): Likewise.
(htab_hash): Likewise.
(htab_eq): Likewise.
(hashtab.h): Add file include.
(gomp_increment_refcount): New function.
(gomp_decrement_refcount): Likewise.
(gomp_map_vars_existing): Add refcount_set parameter, adjust to use
gomp_increment_refcount.
(gomp_map_fields_existing): Add refcount_set parameter, adjust calls
to gomp_map_vars_existing.

(gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
variable to guard OpenMP specific paths, adjust calls to
gomp_map_vars_existing, add structure element sibling splay_tree_key
sequence creation code, adjust Fortran map case to avoid increment
under OpenMP.
(gomp_map_vars): Adjust to static, add refcount_set parameter, manage
local refcount_set if caller passed in NULL, adjust call to
gomp_map_vars_internal.
(gomp_map_vars_async): Adjust and rename into...
(goacc_map_vars): ...this new function, adjust call to
gomp_map_vars_internal.

(gomp_remove_splay_tree_key): New function with code factored out from
gomp_remove_var_internal.
(gomp_remove_var_internal): Add code to handle removing multiple
splay_tree_key sequence for structure elements, adjust code to use
gomp_remove_splay_tree_key for splay-tree key removal.
(gomp_unmap_vars_internal): Add refcount_set parameter, adjust to use
gomp_decrement_refcount.
(gomp_unmap_vars): Adjust to static, add refcount_set parameter, manage
local refcount_set if caller passed in NULL, adjust call to
gomp_unmap_vars_internal.
(gomp_unmap_vars_async): Adjust and rename into...
(goacc_unmap_vars): ...this new function, adjust call to
gomp_unmap_vars_internal.
(GOMP_target): Manage refcount_set and adjust calls to gomp_map_vars and
gomp_unmap_vars.
(GOMP_target_ext): Likewise.
(gomp_target_data_fallback): Adjust call to gomp_map_vars.
(GOMP_target_data): Likewise.
(GOMP_target_data_ext): Likewise.
(GOMP_target_end_data): Adjust call to gomp_unmap_vars.
(gomp_exit_data): Add refcount_set parameter, adjust to use
gomp_decrement_refcount, adjust to queue splay-tree keys for removal
after main loop.
(GOMP_target_enter_exit_data): Manage refcount_set and adjust calls to
gomp_map_vars and gomp_exit_data.
(gomp_target_task_fn): Likewise.

* testsuite/libgomp.c-c++-common/refcount-1.c: New testcase.
* 

[PATCH, OpenMP 5.0] Remove array section base-pointer mapping semantics, and other front-end adjustments (mainline trunk)

2021-05-25 Thread Chung-Lin Tang

Hi Jakub,
this is a version of this patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html
for mainline trunk.

This patch largely implements three pieces of functionality:

(1) Per discussion and clarification on the omp-lang mailing list,
standards conforming behavior for mapping array sections should *NOT* also map 
the base-pointer,
i.e for this code:

struct S { int *ptr; ... };
struct S s;
#pragma omp target enter data map(to: s.ptr[:100])

Currently we generate after gimplify:
#pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \
   map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 
0])

which is deemed incorrect. After this patch, the gimplify results are now 
adjusted to:
#pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])
(the attach operation is still generated, and if s.ptr is already mapped prior, 
attachment will happen)

The correct way of achieving the base-pointer-also-mapped behavior would be to 
use:
#pragma omp target enter data map(to: s.ptr, s.ptr[:100])

This adjustment in behavior required a number of small adjustments here and 
there in gimplify, including
to accomodate map sequences for C++ references.

There is also a small Fortran front-end patch involved (hence CCing Tobias and 
fortran@).
The new gimplify processing changed behavior in handling 
GOMP_MAP_ALWAYS_POINTER maps such that
the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the 
Fortran FE was generating
a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and 
the pre-patch behavior
was removing this map anyways. I have a small change in 
trans-openmp.c:gfc_trans_omp_array_section
to not generate the map in this case, and so far no bad test results.

(2) The second part (though kind of related to the first above) are fixes in 
libgomp/target.c
to not overwrite attached pointers when handling device<->host copies, mainly for the 
"always" case.
This behavior is also noted in the 5.0 spec, but not yet properly coded before.

(3) The third is a set of changes to the C/C++ front-ends to extend the allowed 
component access syntax
in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, 
so despite in the long
term the entire map clause syntax parsing is probably going to be revamped, 
we're still adding this in
for now. These changes are enabled for both OpenACC and OpenMP.

Tested on x86_64-linux with nvptx offloading with no regressions. This patch 
was merged and tested atop
of the prior submitted patches:
 (a) https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570886.html
 "[PATCH, OpenMP 5.0] Improve OpenMP target support for C++ (includes PR92120 
v3)"
 (b) https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570365.html
 "[PATCH, OpenMP 5.0] Implement relaxation of implicit map vs. existing device 
mappings (for mainline trunk)"
so you might queued this one later than those for review.

Thanks,
Chung-Lin

2021-05-25  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-parser.c (struct omp_dim): New struct type for use inside
c_parser_omp_variable_list.
(c_parser_omp_variable_list): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(c_parser_omp_clause_to): Set 'allow_deref' to true in call to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_from): Likewise.
* c-typeck.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(c_finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/cp/ChangeLog:

* parser.c (struct omp_dim): New struct type for use inside
cp_parser_omp_var_list_no_open.
(cp_parser_omp_var_list_no_open): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
cp_parser_omp_var_list for to/from clauses.
* semantics.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(handle_omp_array_sections): Adjust pointer map generation of
references.
(finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/fortran/ChangeLog:

* trans-openmp.c (gfc_trans_omp_array_section): Do not generate
GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.

gcc/ChangeLog:

* gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
accomodate case where 'offset' return of get_inner_reference is
non-NULL.
(is_or_conta

[PATCH, OpenMP 5.0] Improve OpenMP target support for C++ (includes PR92120 v3)

2021-05-20 Thread Chung-Lin Tang

Hi Jakub,
the attached patch is a combination of the below patches already pushed to 
devel/omp/gcc-10,
some are kind of transient bug fixes, but listing all for completeness:

aadfc984: [PATCH] Target mapping C++ members inside member functions
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562467.html

36a1ebdb: [PATCH] OpenMP 5.0: map this[:1] in C++ non-static member functions 
(PR 92120)
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558975.html

bf8605f1: [PATCH] Enable gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF 
(INDIRECT_REF ...)) map clauses.
https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564976.html

da047f63: [PATCH] Fix regression of array members in OpenMP map clauses.
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566086.html

4e714eaa: [PATCH] Fix template case of non-static member access inside member 
functions
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566592.html

2ed80263: [PATCH] Lambda capturing of pointers and references in target 
directives
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566935.html

08caada8: Arrow operator handling for C front-end in OpenMP map clauses
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566419.html

To summarize, this patch set is an improvement for OpenMP target support for 
C++,
including for inside non-static members, lambda objects, and struct member 
deref access expressions.
The corresponding modifications for the C front-end are also included.

This patch supercedes the prior versions of my PR92120 patch (implicit C++ 
map(this[:1])),
so dubbing this "v3" of patch for that PR.

Prior versions of the PR92120 patch was implemented by recording uses of 'this' 
in the parser,
and then use the recorded uses during "finish" to create the implicit maps.

When working on supporting lambda objects, this required using a tree-walk 
style processing of
the OMP_TARGET body, so in only made sense to merge the entire 'this' 
processing together with it,
so a large part of the parser changes were dropped, with the main processing in 
semantics.c now.

Other parser changes to support '->' in map clauses are also with this patch.

Tested without regressions on x86_64-linux with nvptx offloading, okay for 
trunk?

Thanks,
Chung-Lin

2021-05-20  Chung-Lin Tang  

gcc/cp/
* cp-tree.h (finish_omp_target): New declaration.
(finish_omp_target_clauses): Likewise.
* parser.c (cp_parser_omp_clause_map): Adjust call to
cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
(cp_parser_omp_target): Factor out code, adjust into calls to new
function finish_omp_target.
* pt.c (tsubst_expr): Add call to finish_omp_target_clauses for
OMP_TARGET case.
* semantics.c (handle_omp_array_sections_1): Add handling to create
'this->member' from 'member' FIELD_DECL.
(handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
map clauses. Handle 'A->member' case in map clauses.
(struct omp_target_walk_data): New struct for walking over
target-directive tree body.
(finish_omp_target_clauses_r): New function for tree walk.
(finish_omp_target_clauses): New function.
(finish_omp_target): New function.

gcc/c/
* c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
call to c_parser_omp_variable_list to 'true'.
* c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
array base handling.
(c_finish_omp_clauses): Handle 'A->member' case in map clauses.

gcc/
* gimplify.c ("tree-hash-traits.h"): Add include.
(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
hash_map *. Adjust struct map handling to handle
cases of *A and A->B expressions. Under !DECL_P case of
GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when
handling component_ref_p case. Add unshare_expr and gimplification
when created GOMP_MAP_STRUCT is not a DECL. Add code to add
firstprivate pointer for *pointer-to-struct case.
(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
exit data directives code to earlier position.
* omp-low.c (lower_omp_target):
Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
* tree-pretty-print.c (dump_omp_clause): Likewise.

gcc/testsuite/
* gcc.dg/gomp/target-3.c: New testcase.
* g++.dg/gomp/target-3.C: New testcase.
* g++.dg/gomp/target-lambda-1.C: New testcase.
* g++.dg/gomp/target-this-1.C: New testcase.
* g++.dg/gomp/target-this-2.C: New testcase.
* g++.dg/go

Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes

2021-05-18 Thread Chung-Lin Tang

On 2021/5/17 10:26 PM, Julian Brown wrote:

OK, understood. But, I'm a bit concerned that we're ignoring some
"hidden rules" with regards to OMP pointer clause ordering/grouping that
certain code (at least the bit that creates GOMP_MAP_STRUCT node
groups, and parts of omp-low.c) relies on. I believe those rules are as
follows:

  - an array slice is mapped using two or three pointers -- two for a
normal (non-reference) base pointer, and three if we have a
reference to a pointer (i.e. in C++) or an array descriptor (i.e. in
Fortran). So we can have e.g.

GOMP_MAP_TO
GOMP_MAP_ALWAYS_POINTER

GOMP_MAP_TO
GOMP_MAP_.*_POINTER
GOMP_MAP_ALWAYS_POINTER

GOMP_MAP_TO
GOMP_MAP_TO_PSET
GOMP_MAP_ALWAYS_POINTER

  - for OpenACC, we extend this to allow (up to and including
gimplify.c) the GOMP_MAP_ATTACH_DETACH mapping. So we can have (for
component refs):

GOMP_MAP_TO
GOMP_MAP_ATTACH_DETACH

GOMP_MAP_TO
GOMP_MAP_TO_PSET
GOMP_MAP_ATTACH_DETACH

GOMP_MAP_TO
GOMP_MAP_.*_POINTER
GOMP_MAP_ATTACH_DETACH

For the scanning in insert_struct_comp_map (as it is at present) to
work right, these groups must stay intact.  I think the current
behaviour of omp_target_reorder_clauses on the og10 branch can break
those groups apart though!


Originally this sorting was intended to enforce OpenMP 5.0 map ordering
rules, although I did add some ATTACH_DETACH ordering code in the latest
round of patching. May not be the best practice.


(The "prev_list_p" stuff in the loop in question in gimplify.c just
keeps track of the first node in these groups.)


Such a brittle way of doing this; even the variable name is not that
obvious in what it intends to do.


For OpenACC, the GOMP_MAP_ATTACH_DETACH code does*not*  depend on the
previous clause when lowering in omp-low.c. But GOMP_MAP_ALWAYS_POINTER
does! And in one case ("update" directive), GOMP_MAP_ATTACH_DETACH is
rewritten to GOMP_MAP_ALWAYS_POINTER, so for that case at least, the
dependency on the preceding mapping node must stay intact.


Yes, I think there are some weird conventions here, stemming from the 
front-ends.
I would think that _ALWAYS_POINTER should exist at a similar level like 
_ATTACH_DETACH,
both a pointer operation, just different details in runtime behavior, though its
intended purpose for C++ references seem to skew some things here and there.


OpenACC also allows "bare" GOMP_MAP_ATTACH and GOMP_MAP_DETACH nodes
(corresponding to the "attach" and "detach" clauses). Those are handled
a bit differently to GOMP_MAP_ATTACH_DETACH in gimplify.c -- but
GOMP_MAP_ATTACH_Z_L_A_S doesn't quite behave like that either, I don't
think?


IIRC, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION was handled that way (just a 
single
line in gimplify.c) due to idiosyncrasies with the surrounding generated
maps from the C++ front-end (which ATM is the only user of this map-kind).
So yeah, inside the compiler, its not entirely the same as GOMP_MAP_ATTACH,
but it is intended to live through for the runtime to see.


Anyway: I've not entirely understood what omp_target_reorder_clauses is
doing, but I think it may need to try harder to keep the groups
mentioned above together.  What do you think?


As you know, attach operations don't really need to be glued to the prior
operations, it just has to be ordered after mapping of the pointer and the 
pointed.

There's already some book-keeping to move clauses together, but as you say,
it might need more.

Overall, I think this re-organizing of the struct-group creation is a good 
thing,
but actually as you probably also observed, this insistence of "in-flight"
tree chain manipulation is just hard to work with and modify.

Maybe instead of directly working on clause expression chains at this point, we
should be stashing all this information into a single clause tree node,
e.g. starting from the front-end, we can set
'OMP_CLAUSE_MAP_POINTER_KIND(c) = ALWAYS/ATTACH_DETACH/FIRSTPRIVATE/etc.',
(instead of actually creating new, must-follow-in-order maps that's causing all
these conventions).

For struct-groups, during the start of gimplify_scan_omp_clauses(), we could 
work
with map clause tree nodes with OMP_CLAUSE_MAP_STRUCT_LIST(c), which contains 
the
entire TREE_LIST or VEC of elements. Then later, after scanning is complete,
expand the list into the current form. Ordering is only created at this stage.

Just an idea, not sure if it will help understandability in general, but it
should definitely help to simplify when we're reordering due to other rules.

Chung-Lin


Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes

2021-05-17 Thread Chung-Lin Tang

On 2021/5/11 4:57 PM, Julian Brown wrote:

This work-in-progress patch tries to get
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION to behave more like
GOMP_MAP_ATTACH_DETACH -- in that the mapping is made to form groups
to be processed by build_struct_group/build_struct_comp_map.  I think
that's important to integrate with how groups of mappings for array
sections are handled in other cases.

This patch isn't sufficient by itself to fix a couple of broken test cases
at present (libgomp.c++/target-lambda-1.C, libgomp.c++/target-this-4.C),
though.


No, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION is supposed to be just a slightly
different behavior version of GOMP_MAP_ATTACH; it tolerates an unmapped
pointer-target and assigns NULL on the device, instead of just gomp_fatal().
(see its handling in libgomp/target.c)

In case OpenACC can have the same such zero-length array section behavior,
we can just share one GOMP_MAP_ATTACH map. For now it is treated as separate
cases.

Chung-Lin


2021-05-11  Julian Brown  

gcc/
* gimplify.c (build_struct_comp_nodes): Add
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION handling.
(build_struct_group): Process GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION
as part of pointer group.
(gimplify_scan_omp_clauses): Update prev_list_p such that
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION will form part of pointer
group.
---
  gcc/gimplify.c | 16 
  1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 6d204908c82..c5cb486aa23 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8298,7 +8298,9 @@ build_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
if (grp_mid
&& OMP_CLAUSE_CODE (grp_mid) == OMP_CLAUSE_MAP
&& (OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ALWAYS_POINTER
- || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH))
+ || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH
+ || (OMP_CLAUSE_MAP_KIND (grp_mid)
+ == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)))
  {
tree c3
= build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP);
@@ -8774,12 +8776,14 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
 ? splay_tree_lookup (ctx->variables, (splay_tree_key) decl)
 : NULL);
bool ptr = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_POINTER);
-  bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH);
+  bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH
+   || (OMP_CLAUSE_MAP_KIND (c)
+   == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION));
bool attach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
 || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH);
bool has_attachments = false;
/* For OpenACC, pointers in structs should trigger an attach action.  */
-  if (attach_detach
+  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH
&& ((region_type & (ORT_ACC | ORT_TARGET | ORT_TARGET_DATA))
  || code == OMP_TARGET_ENTER_DATA
  || code == OMP_TARGET_EXIT_DATA))
@@ -9784,6 +9788,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  if (!remove
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_POINTER
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ATTACH_DETACH
+ && (OMP_CLAUSE_MAP_KIND (c)
+ != GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET
  && OMP_CLAUSE_CHAIN (c)
  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (c)) == OMP_CLAUSE_MAP
@@ -9792,7 +9798,9 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
  == GOMP_MAP_ATTACH_DETACH)
  || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
- == GOMP_MAP_TO_PSET)))
+ == GOMP_MAP_TO_PSET)
+ || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
+ == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)))
prev_list_p = list_p;
  
  	  break;




Re: [PATCH 5/5] Mapping of components of references to pointers to structs for OpenMP/OpenACC

2021-05-17 Thread Chung-Lin Tang

Hi Julian,

On 2021/5/15 5:27 AM, Julian Brown wrote:

GCC currently raises a parse error for indirect accesses to struct
members, where the base of the access is a reference to a pointer.
This patch fixes that case.



gcc/cp/
* semantics.c (finish_omp_clauses): Handle components of references to
pointers to structs.

libgomp/
* testsuite/libgomp.oacc-c++/deep-copy-17.C: Update test.



--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7670,7 +7670,12 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
  if ((ort == C_ORT_ACC || ort == C_ORT_OMP)
  && TREE_CODE (t) == COMPONENT_REF
  && TREE_CODE (TREE_OPERAND (t, 0)) == INDIRECT_REF)
-   t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
+   {
+ t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
+ /* References to pointers have a double indirection here.  */
+ if (TREE_CODE (t) == INDIRECT_REF)
+   t = TREE_OPERAND (t, 0);
+   }
  if (TREE_CODE (t) == COMPONENT_REF
  && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
  || ort == C_ORT_ACC)


There is already a large plethora of such modifications in this patch:
"[PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping 
semantics, and other front-end adjustments."
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html

I am in the process of taking that patch to mainline, so are you sure this is 
not already handled there?


diff --git a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C 
b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
index dacbb520f3d..e038e9e3802 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
+++ b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
@@ -83,7 +83,7 @@ void strrp (void)
a[0] = 8;
c[0] = 10;
e[0] = 12;
-  #pragma acc parallel copy(n->a[0:10], n->c[0:10], n->e[0:10])
+  #pragma acc parallel copy(n->a[0:10], n->b, n->c[0:10], n->d, n->e[0:10])
{
  n->a[0] = n->c[0] + n->e[0];
}


This testcase can be added.

Chung-Lin






[PATCH, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)

2021-05-14 Thread Chung-Lin Tang

Hi Jakub,
This is a version of patch 
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569665.html
for mainline trunk.

This patch implements relaxing the requirements when a map with the implicit 
attribute encounters
an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, 
lines 18-27 (and 5.1 spec,
page 352, lines 13-22):

"If a single contiguous part of the original storage of a list item with an 
implicit data-mapping
 attribute has corresponding storage in the device data environment prior to a 
task encountering the
 construct that is associated with the map clause, only that part of the 
original storage will have
 corresponding storage in the device data environment as a result of the map 
clause."

Also tracked in the OpenMP spec context as issue #1463:
https://github.com/OpenMP/spec/issues/1463

The implementation inside the compiler is to of course, tag the implicitly 
created maps with some
indication of "implicit". I've done this with a OMP_CLAUSE_MAP_IMPLICIT_P 
macro, using
'base.deprecated_flag' underneath.

There is an encoding of this as GOMP_MAP_IMPLICIT == 
GOMP_MAP_FLAG_SPECIAL_3|GOMP_MAP_FLAG_SPECIAL_4
in include/gomp-constants.h for the runtime, but I've intentionally avoided 
exploding the entire
gimplify/omp-low with a new set of GOMP_MAP_IMPLICIT_TO/FROM/etc. symbols, 
instead adding in the new
flag bits only at the final runtime call generation during omp-lowering.

The rest is libgomp mapping taking care of the implicit case: allowing map 
success if an existing
map is a proper subset of the new map, if the new map is implicit. 
Straightforward enough I think.

There are also some additions to print the implicit attribute during tree 
pretty-printing, for that
reason some scan tests were updated.

Also, another adjustment in this patch is how implicitly created clauses are 
added to the current
clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the 
new clauses to the end,
this patch adds them at the position "after initial non-map clauses, but right 
before any existing
map clauses".

The reason for this is: when combined with other map clauses, for example:

  #pragma omp target map(rec.ptr[:N])
  for (int i = 0; i < N; i++)
rec.ptr[i] += 1;

There will be an implicit map created for map(rec), because of the access 
inside the target region.
The expectation is that 'rec' is implicitly mapped, and then the pointed 
array-section part by 'rec.ptr'
will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' 
(in that order).

If the implicit 'map(rec)' is appended to the end, instead of placed before 
other maps, the attachment
operation will not find anything to attach to, and the entire region will fail.

Note: this touches a bit on another issue which I will be sending a patch for 
later:
per the discussion on omp-lang, an array section list item should *not* be 
mapping its base-pointer
(although an attachment attempt should exist), while in current GCC behavior, 
for struct member pointers
like 'rec.ptr' above, we do map it (which should be deemed incorrect).

This means that as of right now, this modification of map order doesn't really 
exhibit the above mentioned
behavior yet. I have included it as part of this patch because the "[implicit]" 
tree printing requires
modifying many gimple scan tests already, so including the test modifications 
together seems more
manageable patch-wise.

Tested with no regressions on x86_64-linux with nvptx offloading.
Was already pushed to devel/omp/gcc-10 a while ago, asking for approval for 
mainline trunk.

Chung-Lin

2021-05-14  Chung-Lin Tang  

include/ChangeLog:

* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro.
(GOMP_MAP_IMPLICIT): New special map kind bits value.
(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
special map kind bits.
(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.

gcc/ChangeLog:

* tree.h (OMP_CLAUSE_MAP_IMPLICIT_P): New access macro for 'implicit'
bit, using 'base.deprecated_flag' field of tree_node.
* tree-pretty-print.c (dump_omp_clause): Add support for printing
implicit attribute in tree dumping.
* gimplify.c (gimplify_adjust_omp_clauses_1):
Set OMP_CLAUSE_MAP_IMPLICIT_P to 1 if map clause is implicitly created.
(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
clauses, from simple append, to starting of list, after non-map clauses.
* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
values passed to libgomp for implicit maps.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-implicit-map-1.c: New test.
* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
* c-c++-common/goac

Re: [PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments.

2021-05-14 Thread Chung-Lin Tang



On 2021/5/11 11:15 , Thomas Schwinge wrote:

Hi Chung-Lin!

On 2021-05-11T19:28:04+0800, Chung-Lin Tang  wrote:

This patch largely implements three pieces of functionality:

(1) Per discussion and clarification on the omp-lang mailing list,
standards conforming behavior for mapping array sections should *NOT* also map 
the base-pointer,
i.e for this code:

 struct S { int *ptr; ... };
 struct S s;
 #pragma omp target enter data map(to: s.ptr[:100])

Currently we generate after gimplify:
#pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \
map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 
0])

which is deemed incorrect. After this patch, the gimplify results are now 
adjusted to:
#pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])
(the attach operation is still generated, and if s.ptr is already mapped prior, 
attachment will happen)

The correct way of achieving the base-pointer-also-mapped behavior would be to 
use:
#pragma omp target enter data map(to: s.ptr, s.ptr[:100])

This adjustment in behavior required a number of small adjustments here and 
there in gimplify, including
to accomodate map sequences for C++ references.


I'm a bit confused by that -- this mandates the bulk of the testsuite
changes that you've included, and these seem a step backwards in terms of
user experience, but then, I have no state on the exact OpenMP
specification requirements, so you certainly may be right on that.  (And
also, as Julian mentioned, how this relates to OpenACC semantics, which I
also haven't considered in detail -- but I note you didn't adjust any
OpenACC testcases for that, so I suppose that's really conditionalized to
OpenMP only.)


It is indeed a bit awkward to use, but that's what the omp-lang list seemed to 
decide.

This change is OpenMP only. I took care to only handle OpenMP constructs like 
this in the middle-end,
of course this does not preclude some mistake in adjusting the shared code 
paths...




There is also a small Fortran front-end patch involved (hence CCing Tobias).
The new gimplify processing changed behavior in handling 
GOMP_MAP_ALWAYS_POINTER maps such that
the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the 
Fortran FE was generating
a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and 
the pre-patch behavior
was removing this map anyways. I have a small change in 
trans-openmp.c:gfc_trans_omp_array_section
to not generate the map in this case, and so far no bad test results.


Makes sense to argue that one separately, with testcases, for the master
branch submission?


Maybe. although this part was needed to solve a regression caused by the above 
changes.


(2) The second part (though kind of related to the first above) are fixes in 
libgomp/target.c
to not overwrite attached pointers when handling device<->host copies, mainly for the 
"always" case.
This behavior is also noted in the 5.0 spec, but not yet properly coded before.


Likewise, if that makes sense?


Some of the separation of base-pointer/array-section in map clauses seemed to 
step on this bug
(e.g. if one mechanically updates "s.ptr[:N]" into "s.ptr, s.ptr[:N]", and a 
target-update overwrites the
base-pointer)  So it's arguably separate, but also can cause some testsuite 
chaos if not included together.




(3) The third is a set of changes to the C/C++ front-ends to extend the allowed 
component access syntax
in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, 
so despite in the long
term the entire map clause syntax parsing is probably going to be revamped, 
we're still adding this in
for now. These changes are enabled for both OpenACC and OpenMP.


Likewise, if that makes sense?  ;-)


Yeah, this might be separated :P


Tested on x86_64-linux with nvptx offloading with no regressions.


I'm seeing a regression with
'libgomp.oacc-c-c++-common/noncontig_array-1.c' execution testing, both C
and C++, for '-O2' (but not '-O0'), and only for about half of the
invocations.  But it seems to reliable reproduce in GDB:

 Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
 gomp_decrement_refcount (do_remove=, do_copy=, delete_p=false, refcount_set=0x0, k=0xc4d450) at 
[...]/source-gcc/libgomp/target.c:468
 468   uintptr_t orig_refcount = *refcount_ptr;
 (gdb) bt
 #0  gomp_decrement_refcount (do_remove=, 
do_copy=, delete_p=false, refcount_set=0x0, k=0xc4d450) at 
[...]/source-gcc/libgomp/target.c:468
 #1  gomp_unmap_vars_internal (aq=0x0, aq@entry=0x8223c0, refcount_set=0x0, 
do_copyfrom=, do_copyfrom@entry=true, tgt=tgt@entry=0xc696a0) at 
[...]/source-gcc/libgomp/target.c:2065
 #2  goacc_unmap_vars (tgt=tgt@entry=0xc696a0, 
do_copyfrom=do_copyfrom@entry=true, aq=aq@entry=0x0) at 
[...]/source-gcc/libgomp/target.c:2118
 #3  0x77daa41c in GOACC_pa

[PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments.

2021-05-11 Thread Chung-Lin Tang

This patch largely implements three pieces of functionality:

(1) Per discussion and clarification on the omp-lang mailing list,
standards conforming behavior for mapping array sections should *NOT* also map 
the base-pointer,
i.e for this code:

   struct S { int *ptr; ... };
   struct S s;
   #pragma omp target enter data map(to: s.ptr[:100])

Currently we generate after gimplify:
#pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \
  map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])

which is deemed incorrect. After this patch, the gimplify results are now 
adjusted to:
#pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])
(the attach operation is still generated, and if s.ptr is already mapped prior, 
attachment will happen)

The correct way of achieving the base-pointer-also-mapped behavior would be to 
use:
#pragma omp target enter data map(to: s.ptr, s.ptr[:100])

This adjustment in behavior required a number of small adjustments here and 
there in gimplify, including
to accomodate map sequences for C++ references.

There is also a small Fortran front-end patch involved (hence CCing Tobias).
The new gimplify processing changed behavior in handling 
GOMP_MAP_ALWAYS_POINTER maps such that
the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the 
Fortran FE was generating
a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and 
the pre-patch behavior
was removing this map anyways. I have a small change in 
trans-openmp.c:gfc_trans_omp_array_section
to not generate the map in this case, and so far no bad test results.

(2) The second part (though kind of related to the first above) are fixes in 
libgomp/target.c
to not overwrite attached pointers when handling device<->host copies, mainly for the 
"always" case.
This behavior is also noted in the 5.0 spec, but not yet properly coded before.

(3) The third is a set of changes to the C/C++ front-ends to extend the allowed 
component access syntax
in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, 
so despite in the long
term the entire map clause syntax parsing is probably going to be revamped, 
we're still adding this in
for now. These changes are enabled for both OpenACC and OpenMP.

Tested on x86_64-linux with nvptx offloading with no regressions. Pushed to 
devel/omp/gcc-10, will
send mainline version of patch later.

Chung-Lin

2021-05-11  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-parser.c (struct omp_dim): New struct type for use inside
c_parser_omp_variable_list.
(c_parser_omp_variable_list): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(c_parser_omp_clause_to): Set 'allow_deref' to true in call to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_from): Likewise.
* c-typeck.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(c_finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/cp/ChangeLog:

* parser.c (struct omp_dim): New struct type for use inside
cp_parser_omp_var_list_no_open.
(cp_parser_omp_var_list_no_open): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
cp_parser_omp_var_list for to/from clauses.
* semantics.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(handle_omp_array_sections): Adjust pointer map generation of
references.
(finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/fortran/ChangeLog:

* trans-openmp.c (gfc_trans_omp_array_section): Do not generate
GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.


gcc/ChangeLog:

* gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
accomodate case where 'offset' return of get_inner_reference is
non-NULL.
(is_or_contains_p): Further robustify conditions.
(omp_target_reorder_clauses): In alloc/to/from sorting phase, also
move following GOMP_MAP_ALWAYS_POINTER maps along.  Add new sorting
phase where we make sure pointers with an attach/detach map are ordered
correctly.
(gimplify_scan_omp_clauses): Add modifications to avoid creating
GOMP_MAP_STRUCT and associated alloc map for attach/detach maps.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase.
* c-c++-common/gomp/targe

Re: [PATCH, OG10, OpenMP 5.0, committed] Implement relaxation of implicit map vs. existing device mappings

2021-05-10 Thread Chung-Lin Tang

On 2021/5/7 8:35 PM, Thomas Schwinge wrote:

On 2021-05-05T23:17:25+0800, Chung-Lin Tang via 
Gcc-patches  wrote:

This patch implements relaxing the requirements when a map with the implicit 
attribute encounters
an overlapping existing map.  [...]

Oh, oh, these data mapping interfaces/semantics ares getting more and
more "convoluted"...  %-\ (Not your fault, of course.)

Haven't looked in too much detail in the patch/implementation (I'm not
very well-versend in the exact OpenMP semantics anyway), but I suppose we
should do similar things for OpenACC, too.  I think we even currently do
have a gimplification-level "hack" to replicate data clauses' array
bounds for implicit data clauses on compute constructs, if the default
"complete" mapping is going to clash with a "limited" mapping that's
specified in an outer OpenACC 'data' directive.  (That, of course,
doesn't work for the general case of non-lexical scoping, or dynamic
OpenACC 'enter data', etc., I suppose) I suppose your method could easily
replace and improve that; we shall look into that later.

That said, in your patch, is this current implementation (explicitly)
meant or not meant to be active for OpenACC, too, or just OpenMP (I
couldn't quickly tell), and/or is it (implicitly?) a no-op for OpenACC?


It appears that I have inadvertently enabled it for OpenACC as well!
But everything was tested together, so I assume it works okay for that mode as 
well.

The entire set of implicit-specific actions are enabled by the setting of
'OMP_CLAUSE_MAP_IMPLICIT_P (clause) = 1' inside 
gimplify.c:gimplify_adjust_omp_clauses_1,
so in case you want to disable it for OpenACC again, that's where you need to 
add the guard condition.


Also, another adjustment in this patch is how implicitly created clauses are 
added to the current
clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the 
new clauses to the end,
this patch adds them at the position "after initial non-map clauses, but right 
before any existing
map clauses".

Probably you haven't been testing such a configuration; I've just pushed
"Fix up 'c-c++-common/goacc/firstprivate-mappings-1.c' for C, non-LP64"
to devel/omp/gcc-10 branch in commit
c51cc3b96f0b562deaffcfbcc51043aed216801a, see attached.


Thanks, I was relying on eyeballing to know where to fix testcases like this;
I did fix another similar case, but missed this one.




The reason for this is: when combined with other map clauses, for example:

#pragma omp target map(rec.ptr[:N])
for (int i = 0; i < N; i++)
  rec.ptr[i] += 1;

There will be an implicit map created for map(rec), because of the access 
inside the target region.
The expectation is that 'rec' is implicitly mapped, and then the pointed 
array-section part by 'rec.ptr'
will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' 
(in that order).

If the implicit 'map(rec)' is appended to the end, instead of placed before 
other maps, the attachment
operation will not find anything to attach to, and the entire region will fail.

But that doesn't (negatively) affect user-visible semantics (OpenMP, and
also OpenACC, if applicable), in that more/bigger objects then get mapped
than were before?  (I suppose not?)


It probably won't affect user level semantics, although we should look out if 
this change in convention
exposes some other bugs.

Chung-Lin


[PATCH, OG10, OpenMP 5.0, committed] Implement relaxation of implicit map vs. existing device mappings

2021-05-05 Thread Chung-Lin Tang via Gcc-patches

This patch implements relaxing the requirements when a map with the implicit 
attribute encounters
an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, 
lines 18-27 (and 5.1 spec,
page 352, lines 13-22):

"If a single contiguous part of the original storage of a list item with an 
implicit data-mapping
 attribute has corresponding storage in the device data environment prior to a 
task encountering the
 construct that is associated with the map clause, only that part of the 
original storage will have
 corresponding storage in the device data environment as a result of the map 
clause."

Also tracked in the OpenMP spec context as issue #1463:
https://github.com/OpenMP/spec/issues/1463

The implementation inside the compiler is to of course, tag the implicitly 
created maps with some
indication of "implicit". I've done this with a OMP_CLAUSE_MAP_IMPLICIT_P 
macro, using
'base.deprecated_flag' underneath.

There is an encoding of this as GOMP_MAP_IMPLICIT == 
GOMP_MAP_FLAG_SPECIAL_3|GOMP_MAP_FLAG_SPECIAL_4
in include/gomp-constants.h for the runtime, but I've intentionally avoided 
exploding the entire
gimplify/omp-low with a new set of GOMP_MAP_IMPLICIT_TO/FROM/etc. symbols, 
instead adding in the new
flag bits only at the final runtime call generation during omp-lowering.

The rest is libgomp mapping taking care of the implicit case: allowing map 
success if an existing
map is a proper subset of the new map, if the new map is implicit. 
Straightforward enough I think.

There are also some additions to print the implicit attribute during tree 
pretty-printing, for that
reason some scan tests were updated.

Also, another adjustment in this patch is how implicitly created clauses are 
added to the current
clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the 
new clauses to the end,
this patch adds them at the position "after initial non-map clauses, but right 
before any existing
map clauses".

The reason for this is: when combined with other map clauses, for example:

  #pragma omp target map(rec.ptr[:N])
  for (int i = 0; i < N; i++)
rec.ptr[i] += 1;

There will be an implicit map created for map(rec), because of the access 
inside the target region.
The expectation is that 'rec' is implicitly mapped, and then the pointed 
array-section part by 'rec.ptr'
will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' 
(in that order).

If the implicit 'map(rec)' is appended to the end, instead of placed before 
other maps, the attachment
operation will not find anything to attach to, and the entire region will fail.

Note: this touches a bit on another issue which I will be sending a patch for 
later:
per the discussion on omp-lang, an array section list item should *not* be 
mapping its base-pointer
(although an attachment attempt should exist), while in current GCC behavior, 
for struct member pointers
like 'rec.ptr' above, we do map it (which should be deemed incorrect).

This means that as of right now, this modification of map order doesn't really 
exhibit the above mentioned
behavior yet. I have included it as part of this patch because the "[implicit]" 
tree printing requires
modifying many gimple scan tests already, so including the test modifications 
together seems more
manageable patch-wise.

Tested with no regressions, and pushed to devel/omp/gcc-10. Will be submitting 
a mainline trunk version later.

Chung-Lin

2021-05-05  Chung-Lin Tang  

include/ChangeLog:

* gomp-constants.h (GOMP_MAP_IMPLICIT): New special map kind bits value.
(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
special map kind bits.
(GOMP_MAP_NONCONTIG_ARRAY_P): Adjust test for non-contiguous array map
kind bits to be more specific.
(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.

gcc/ChangeLog:

* tree.h (OMP_CLAUSE_MAP_IMPLICIT_P): New access macro for 'implicit'
bit, using 'base.deprecated_flag' field of tree_node.
* tree-pretty-print.c (dump_omp_clause): Add support for printing
implicit attribute in tree dumping.
* gimplify.c (gimplify_adjust_omp_clauses_1):
Set OMP_CLAUSE_MAP_IMPLICIT_P to 1 if map clause is implicitly created.
(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
clauses, from simple append, to starting of list, after non-map clauses.
* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
values passed to libgomp for implicit maps.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-implicit-map-1.c: New test.
* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
* c-c++-common/goacc/mdc-1.c: Likewise.
* c-c++-common/goacc/reduction-1.c: Likewise.
* c-c++-common/goacc/redu

[PATCH, OG10, C++, OpenMP 5.0] Support lambda capturing of pointers and references in target directives

2021-03-18 Thread Chung-Lin Tang

This patch adds proper lambda capturing of pointer and reference variables
as specified in OpenMP 5.0. We map the entire closure object as a to-map,
attach pointers to zero-length array sections, and perform mapping of
references.

The main way of implementation is by tree-walk when finishing processing
of target directives. Due to this nature, it seemed only complete to
combine the processing with all of the this[:1] map creation handling.
This makes this patch also a partial rewrite of PR92120, though things
seem to look better in the new form.
(and yes, the submitted PR92120 patch for mainline is in need of a "v3" re-work)

Now this tree walk is applied in the non-template case and after/during
template instantiation, so a prior patch to relax finish_omp_clauses()
cases to force the this[:1] changes to work are no longer needed, thus
reverted in this patch.

Tested without regressions on x86_64-linux with nvptx offloading,
and pushed to devel/omp/gcc-10.

2021-03-18  Chung-Lin Tang  

gcc/cp/ChangeLog:

* cp-tree.h (set_omp_target_this_expr): Delete.
(finish_omp_target_clauses): New prototype.
* lambda.c (lambda_expr_this_capture): Remove call to
set_omp_target_this_expr.
* parser.c (cp_parser_omp_target): Likewise.
* pt.c (tsubst_expr): Add call to finish_omp_target_clauses for target
directives.
* semantics.c (omp_target_this_expr): Delete.
(omp_target_ptr_members_accessed): Delete.
(finish_non_static_data_member): Remove call to
set_omp_target_this_expr. Remove use of omp_target_ptr_members_accessed.
(finish_this_expr): Remove call to set_omp_target_this_expr.
(struct omp_target_walk_data): New struct for walking over
target-directive tree body.
(finish_omp_target_clauses_r): New function for tree walk.
(finish_omp_target_clauses): New function, with code factored out from
finish_omp_target. Add lambda object handling case.
(finish_omp_target): Factor code out and adjust to use
finish_omp_target_clauses.
(finish_omp_clauses): Revert prior "Adjustments to allow '*ptr' and
'ptr->member' cases in map clausess.", since not needed with new
organization of target-directive clause processing.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/target-lambda-1.C: New test.

libgomp/testsuite/ChangeLog:

* libgomp.c++/target-lambda-1.C: New test.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b77bdc380a0..247a3bb1ec3 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7316,7 +7316,7 @@ extern void finish_lambda_scope   (void);
 extern tree start_lambda_function  (tree fn, tree lambda_expr);
 extern void finish_lambda_function (tree body);
 extern tree finish_omp_target  (location_t, tree, tree, bool);
-extern void set_omp_target_this_expr   (tree);
+extern void finish_omp_target_clauses  (location_t, tree, tree *);
 
 /* in tree.c */
 extern int cp_tree_operand_length  (const_tree);
diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 9ecf0dbed0c..b55c2f85d27 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -842,9 +842,6 @@ lambda_expr_this_capture (tree lambda, int add_capture_p)
 type cast (_expr.cast_ 5.4) to the type of 'this'. [ The cast
 ensures that the transformed expression is an rvalue. ] */
   result = rvalue (result);
-
-  /* Acknowledge to OpenMP target that 'this' was referenced.  */
-  set_omp_target_this_expr (result);
 }
 
   return result;
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1af233690a2..9fc2a9b05eb 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -40786,7 +40786,6 @@ cp_parser_omp_target (cp_parser *parser, cp_token 
*pragma_tok,
  keep_next_level (true);
  tree sb = begin_omp_structured_block (), ret;
  unsigned save = cp_parser_begin_omp_structured_block (parser);
- set_omp_target_this_expr (NULL_TREE);
  switch (ccode)
{
case OMP_TEAMS:
@@ -40881,7 +40880,6 @@ cp_parser_omp_target (cp_parser *parser, cp_token 
*pragma_tok,
"#pragma omp target", pragma_tok);
   c_omp_adjust_map_clauses (clauses, true);
   keep_next_level (true);
-  set_omp_target_this_expr (NULL_TREE);
   tree body = cp_parser_omp_structured_block (parser, if_p);
 
   finish_omp_target (pragma_tok->location, clauses, body, false);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 90cee31bb5a..139d1075986 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -18631,6 +18631,11 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl,
   t = copy_node (t);
   OMP_BODY (t) = stmt;
   OMP_CLAUSES (t) = tmp;
+
+  if (TREE_CODE (t) == OMP_TARGET)
+   finish_omp_target_clauses (EXPR_LOCATION (t), OMP_BODY (t),
+ 

[PATCH, OG10, C++, committed] Fix non-static member mapping in templates

2021-03-11 Thread Chung-Lin Tang

There was a case of the implicit non-static pointer member mapping
not working properly with templates.

What happened was that the code in finish_omp_target() created the
map clauses (which normally runs after finish_omp_clauses), but being
a template class it was put through all the tsubst_* stuff and at the
end thrown into finish_omp_clauses a 2nd time. And because finish_omp_clauses
didn't handle some of the implicitly created map clauses, things didn't
work...

This patch slightly fixes many handled cases in these parts, plus some
adjustments in gimplify.c.

Tested without regressions, and pushed to devel/omp/gcc-10.

Chung-Lin
From 4e714eaad985f68533f267b8df2026e5c14d084a Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 11 Mar 2021 00:31:08 -0800
Subject: [PATCH] Fix template case of non-static member access inside member
 functions

Prior patches for C++ non-static member access had problems under template
classes, due to re-calling of finish_omp_clauses after finish_omp_target
created the implicit maps required, but not of allowed form in 
finish_omp_clauses.

This patch solves this by slightly relaxing the allowed expressions in
finish_omp_clauses.

2021-03-11  Chung-Lin Tang  

gcc/cp/ChangeLog:

* semantics.c (finish_omp_clauses): Adjustments to allow '*ptr' and
'ptr->member' cases in map clausess.
(finish_omp_target): Use INDIRECT_REF instead of MEM_REF in created
clauses, add processing_template_decl handling.

gcc/ChangeLog:

* gimplify.c (gimplify_scan_omp_clauses): Under !DECL_P case of
GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
struct_deref_set for map(*ptr_to_struct) cases.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/target-this-3.C: Adjust scan test.
* g++.dg/gomp/target-this-4.C: Likewise.
* g++.dg/gomp/target-this-5.C: New test.

libgomp/ChangeLog:

* testsuite/libgomp.c++/target-this-5.C: New test.
---
 gcc/cp/semantics.c| 45 +--
 gcc/gimplify.c| 19 +++
 gcc/testsuite/g++.dg/gomp/target-this-3.C |  2 +-
 gcc/testsuite/g++.dg/gomp/target-this-4.C |  2 +-
 gcc/testsuite/g++.dg/gomp/target-this-5.C | 34 
 libgomp/testsuite/libgomp.c++/target-this-5.C | 30 ++
 6 files changed, 120 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/target-this-5.C
 create mode 100644 libgomp/testsuite/libgomp.c++/target-this-5.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 55a5983..5b62fa3 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6407,6 +6407,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
   bool order_seen = false;
   bool schedule_seen = false;
   bool oacc_async = false;
+  bool indirect_ref_p = false;
   bool indir_component_ref_p = false;
   tree last_iterators = NULL_TREE;
   bool last_iterators_remove = false;
@@ -7516,6 +7517,14 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
  indir_component_ref_p = true;
  STRIP_NOPS (t);
}
+ indirect_ref_p = false;
+ if ((ort == C_ORT_ACC || ort == C_ORT_OMP)
+ && INDIRECT_REF_P (t))
+   {
+ t = TREE_OPERAND (t, 0);
+ indirect_ref_p = true;
+ STRIP_NOPS (t);
+   }
  if (TREE_CODE (t) == COMPONENT_REF
  && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
  || ort == C_ORT_ACC)
@@ -7551,6 +7560,12 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
  break;
}
  t = TREE_OPERAND (t, 0);
+ if (INDIRECT_REF_P (t))
+   {
+ t = TREE_OPERAND (t, 0);
+ indir_component_ref_p = true;
+ STRIP_NOPS (t);
+   }
}
  if (remove)
break;
@@ -7614,6 +7629,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
   || (OMP_CLAUSE_MAP_KIND (c)
   != GOMP_MAP_FIRSTPRIVATE_POINTER))
   && !indir_component_ref_p
+  && !indirect_ref_p
   && !cxx_mark_addressable (t))
remove = true;
  else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
@@ -7698,7 +7714,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
}
  else
{
- bitmap_set_bit (_head, DECL_UID (t));
+ if (!indirect_ref_p && !indir_component_ref_p)
+   bitmap_set_bit (_head, DECL_UID (t));
  if (t != OMP_CLAUSE_DECL (c)
  && TREE_CODE (OMP_CLAUSE_DECL (c)) == COMPONENT_REF)
bitmap_set_bit (_field_head, DECL_UID

[PATCH, OG10, OpenMP, committed] Support A->B expressions in map clause (C front-end)

2021-03-08 Thread Chung-Lin Tang

This patch is a merge of parts from:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562467.html

and devel/omp/gcc-10 commit 36a1eb, which was a modified merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558975.html

to provide the equivalent front-end patches for support "map(A->B)"
clauses for the C front-end (only the C++ front-end received such
changes before). Some associated middle-end changes are also in
this patch.

Tested without regressions, and pushed to devel/omp/gcc-10.

Chung-Lin
From 08caada8efd8f35db634647bbda6091fb667b00d Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Mon, 8 Mar 2021 15:56:52 +0800
Subject: [PATCH] Arrow operator handling for C front-end in OpenMP map clauses

This patch merges some of the equivalent changes already done for the C++
front-end to the C parts.

2021-03-08  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
call to c_parser_omp_variable_list to 'true'.
* c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
array base handling.
(c_finish_omp_clauses): Handle 'A->member' case in map clauses.

gcc/ChangeLog:

* gimplify.c (gimplify_scan_omp_clauses): Add MEM_REF case when
handling component_ref_p case. Add unshare_expr and gimplification
when created GOMP_MAP_STRUCT is not a DECL. Add code to add
firstprivate pointer for *pointer-to-struct case.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/target-3.c: New test.
---
 gcc/c/c-parser.c |  3 +-
 gcc/c/c-typeck.c | 22 +++
 gcc/gimplify.c   | 41 ++--
 gcc/testsuite/gcc.dg/gomp/target-3.c | 16 +++
 4 files changed, 79 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index fae597128e9..0a6aee439f6 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15700,7 +15700,8 @@ c_parser_omp_clause_map (c_parser *parser, tree list)
}
 }
 
-  nl = c_parser_omp_variable_list (parser, clause_loc, OMP_CLAUSE_MAP, list);
+  nl = c_parser_omp_variable_list (parser, clause_loc, OMP_CLAUSE_MAP, list,
+  C_ORT_OMP, true);
 
   for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
 OMP_CLAUSE_SET_MAP_KIND (c, kind);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 6af19766324..7c887a80ce9 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12917,6 +12917,12 @@ handle_omp_array_sections_1 (tree c, tree t, vec 
,
  return error_mark_node;
}
  t = TREE_OPERAND (t, 0);
+ if ((ort == C_ORT_ACC || ort == C_ORT_OMP)
+ && TREE_CODE (t) == MEM_REF)
+   {
+ t = TREE_OPERAND (t, 0);
+ STRIP_NOPS (t);
+   }
  if (ort == C_ORT_ACC && TREE_CODE (t) == MEM_REF)
{
  if (maybe_ne (mem_ref_offset (t), 0))
@@ -13778,6 +13784,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
   tree ordered_clause = NULL_TREE;
   tree schedule_clause = NULL_TREE;
   bool oacc_async = false;
+  bool indir_component_ref_p = false;
   tree last_iterators = NULL_TREE;
   bool last_iterators_remove = false;
   tree *nogroup_seen = NULL;
@@ -14505,6 +14512,11 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
{
  while (TREE_CODE (t) == COMPONENT_REF)
t = TREE_OPERAND (t, 0);
+ if (TREE_CODE (t) == MEM_REF)
+   {
+ t = TREE_OPERAND (t, 0);
+ STRIP_NOPS (t);
+   }
  if (bitmap_bit_p (_field_head, DECL_UID (t)))
break;
  if (bitmap_bit_p (_head, DECL_UID (t)))
@@ -14561,6 +14573,15 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
   bias) to zero here, so it is not set erroneously to the pointer
   size later on in gimplify.c.  */
OMP_CLAUSE_SIZE (c) = size_zero_node;
+ indir_component_ref_p = false;
+ if ((ort == C_ORT_ACC || ort == C_ORT_OMP)
+ && TREE_CODE (t) == COMPONENT_REF
+ && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF)
+   {
+ t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
+ indir_component_ref_p = true;
+ STRIP_NOPS (t);
+   }
  if (TREE_CODE (t) == COMPONENT_REF
  && OMP_CLAUSE_CODE (c) != OMP_CLAUSE__CACHE_)
{
@@ -14633,6 +14654,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
  else if ((OMP_CLAUSE_CODE (c) != OMP_C

[PATCH, C++, OG10, OpenACC/OpenMP, committed] Allow static constexpr fields in mappable types

2021-03-03 Thread Chung-Lin Tang

On 2020/1/21 12:49 AM, Jakub Jelinek wrote:

The OpenMP 4.5 definition of mappable type for C++ is that
   - All data members must be non-static.
among other requirements.  In OpenMP 5.0 that has been removed.
So, if we follow the 4.5 definition, it shouldn't change, if we follow 5.0
definition, the whole loop should be dropped, but in no case shall static
constexpr data members be treated any differently from any other static data
members.


We have merged the patch as is (only static constexprs) to devel/omp/gcc-10
for now. Its possible that the entire checking loop should be eventually removed
to allow the full 5.0 range, but wondered if things like (automatic) 
accessibility
of the static members within target regions is an issue to resolve?
For now, I've committed the patch in its current state to OG10.

Re-tested on OG10, and committed with an additional testcase (same for OpenMP)

Chung-Lin

cp/
* decl2.c (cp_omp_mappable_type_1): Allow fields with
DECL_DECLARED_CONSTEXPR_P to be mapped.

testsuite/
* g++.dg/goacc/static-constexpr-1.C: New test.
* g++.dg/gomp/static-constexpr-1.C: New test.

From 1c3f38b30c1db0aef5ccbf6d20fb5fd13785d482 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Wed, 3 Mar 2021 22:39:10 +0800
Subject: [PATCH] Allow static constexpr fields in mappable types for C++

This patch is a merge of:
https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg01246.html

Static members in general disqualify a C++ class from being target mappable,
but static constexprs are inline optimized away, so should not interfere.

OpenMP 5.0 in general lifts the static member limitation, so this
patch will probably further adjusted later.

2021-03-03  Chung-Lin Tang  

gcc/cp/ChangeLog:

* decl2.c (cp_omp_mappable_type_1): Allow fields with
DECL_DECLARED_CONSTEXPR_P to be mapped.

gcc/testsuite/ChangeLog:

* g++.dg/goacc/static-constexpr-1.C: New test.
* g++.dg/gomp/static-constexpr-1.C: New test.
---
 gcc/cp/decl2.c  |  5 -
 gcc/testsuite/g++.dg/goacc/static-constexpr-1.C | 17 +
 gcc/testsuite/g++.dg/gomp/static-constexpr-1.C  | 17 +
 3 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/goacc/static-constexpr-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/static-constexpr-1.C

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 5343ea3b068..872122fe83c 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -1460,7 +1460,10 @@ cp_omp_mappable_type_1 (tree type, bool notes)
 {
   tree field;
   for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
-   if (VAR_P (field))
+   if (VAR_P (field)
+   /* Fields that are 'static constexpr' can be folded away at compile
+  time, thus does not interfere with mapping.  */
+   && !DECL_DECLARED_CONSTEXPR_P (field))
  {
if (notes)
  inform (DECL_SOURCE_LOCATION (field),
diff --git a/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C 
b/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C
new file mode 100644
index 000..edf5f1a7628
--- /dev/null
+++ b/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C
@@ -0,0 +1,17 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+
+/* Test that static constexpr members do not interfere with offloading.  */
+struct rec
+{
+  static constexpr int x = 1;
+  int y, z;
+};
+
+void foo (rec& r)
+{
+  #pragma acc parallel copy(r)
+  {
+r.y = r.y = r.x;
+  }
+}
diff --git a/gcc/testsuite/g++.dg/gomp/static-constexpr-1.C 
b/gcc/testsuite/g++.dg/gomp/static-constexpr-1.C
new file mode 100644
index 000..39eee92
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/static-constexpr-1.C
@@ -0,0 +1,17 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+
+/* Test that static constexpr members do not interfere with offloading.  */
+struct rec
+{
+  static constexpr int x = 1;
+  int y, z;
+};
+
+void foo (rec& r)
+{
+  #pragma omp target map(r)
+  {
+r.y = r.y = r.x;
+  }
+}
-- 
2.17.1



[PATCH, OG10, OpenMP, committed] Fix array members in OpenMP map clauses

2021-03-02 Thread Chung-Lin Tang

Previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564976.html

was reverted by Catherine when I was away, due to regressions in mapping
array members. The fix appears to be a re-placement of 
finish_non_static_data_member()
inside handle_omp_array_sections().

Tested and committed to devel/omp/gcc-10, the above patch was also re-committed 
as well.

Chung-Lin
From da047f63c601118ad875d13929453094acc6c6c9 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Fri, 26 Feb 2021 20:13:29 +0800
Subject: [PATCH] Fix regression of array members in OpenMP map clauses.

Fixed a regression of array members not working in OpenMP map clauses after
commit bf8605f14ec33ea31233a3567f3184fee667b695.

This patch itself probably should be considered a fix for commit aadfc9843.

2021-02-26  Chung-Lin Tang  

gcc/cp/ChangeLog:

* semantics.c (handle_omp_array_sections): Adjust position of making
COMPONENT_REF from FIELD_DECL to earlier position.
---
 gcc/cp/semantics.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 370d5831091..55a5983528e 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5386,6 +5386,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
}
  OMP_CLAUSE_DECL (c) = first;
  OMP_CLAUSE_SIZE (c) = size;
+ if (TREE_CODE (t) == FIELD_DECL)
+   t = finish_non_static_data_member (t, NULL_TREE, NULL_TREE);
  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
  || (TREE_CODE (t) == COMPONENT_REF
  && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE))
@@ -5414,8 +5416,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
  }
  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c),
  OMP_CLAUSE_MAP);
- if (TREE_CODE (t) == FIELD_DECL)
-   t = finish_non_static_data_member (t, NULL_TREE, NULL_TREE);
  if ((ort & C_ORT_OMP_DECLARE_SIMD) != C_ORT_OMP && ort != C_ORT_ACC)
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_POINTER);
  else if (TREE_CODE (t) == COMPONENT_REF)
-- 
2.17.1



[PATCH, OG10, committed] Support A->B expressions in map clause

2021-02-08 Thread Chung-Lin Tang

This patch tries to allow map(A->ptr) to be properly handled the same way as
map(B.ptr) expressions. map(struct:*A) clauses are now produced during
gimplify.

Julian, I'm CCing you since IIRC you seemed to be the author of this area of
code. Would appreciate if you gave a look if you have time, though I've already
went ahead and pushed to OG10 after testing results looked okay.

Thanks,
Chung-Lin

gcc/ChangeLog:

* gimplify.c ("tree-hash-traits.h"): Add include.
(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
hash_map *. Adjust struct map handling to handle
cases of *A and A->B expressions.
(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
exit data directives code to earlier position.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/target-3.C: Adjust testcase gimple scanning.
* g++.dg/gomp/target-this-2.C: Likewise.
* g++.dg/gomp/target-this-3.C: Likewise.
* g++.dg/gomp/target-this-4.C: Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c++/target-23.C: New testcase.
From bf8605f14ec33ea31233a3567f3184fee667b695 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Mon, 8 Feb 2021 07:53:55 -0800
Subject: [PATCH] Enable gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF
 (INDIRECT_REF ...)) map clauses.

This patch tries to allow map(A->ptr) to be properly handled the same way as
map(B.ptr) expressions. map(struct:*A) clauses are now produced during
gimplify.

This patch, as of time of commit, is only pushed to devel/omp/gcc-10, not yet
submitted as mainline patch to upstream.

2021-02-08  Chung-Lin Tang  

gcc/ChangeLog:

* gimplify.c ("tree-hash-traits.h"): Add include.
(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
hash_map *. Adjust struct map handling to handle
cases of *A and A->B expressions.
(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
exit data directives code to earlier position.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/target-3.C: Adjust testcase gimple scanning.
* g++.dg/gomp/target-this-2.C: Likewise.
* g++.dg/gomp/target-this-3.C: Likewise.
* g++.dg/gomp/target-this-4.C: Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c++/target-23.C: New testcase.
---
 gcc/gimplify.c| 51 +++
 gcc/testsuite/g++.dg/gomp/target-3.C  |  2 +-
 gcc/testsuite/g++.dg/gomp/target-this-2.C |  2 +-
 gcc/testsuite/g++.dg/gomp/target-this-3.C |  2 +-
 gcc/testsuite/g++.dg/gomp/target-this-4.C |  4 +--
 libgomp/testsuite/libgomp.c++/target-23.C | 34 +
 6 files changed, 78 insertions(+), 17 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c++/target-23.C

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index b90ba5b..ba19017 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "tree-cfg.h"
 #include "tree-ssa.h"
+#include "tree-hash-traits.h"
 #include "omp-general.h"
 #include "omp-low.h"
 #include "gimple-low.h"
@@ -8514,7 +8515,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
 {
   struct gimplify_omp_ctx *ctx, *outer_ctx;
   tree c;
-  hash_map *struct_map_to_clause = NULL;
+  hash_map *struct_map_to_clause = NULL;
   hash_set *struct_deref_set = NULL;
   tree *prev_list_p = NULL, *orig_list_p = list_p;
   int handled_depend_iterators = -1;
@@ -9082,12 +9083,15 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  && TREE_CODE (decl) == INDIRECT_REF
  && TREE_CODE (TREE_OPERAND (decl, 0)) == COMPONENT_REF
  && (TREE_CODE (TREE_TYPE (TREE_OPERAND (decl, 0)))
- == REFERENCE_TYPE))
+ == REFERENCE_TYPE)
+ && (OMP_CLAUSE_MAP_KIND (c)
+ != GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION))
{
  pd = _OPERAND (decl, 0);
  decl = TREE_OPERAND (decl, 0);
}
  bool indir_p = false;
+ bool component_ref_p = false;
  tree orig_decl = decl;
  tree decl_ref = NULL_TREE;
  if ((region_type & (ORT_ACC | ORT_TARGET | ORT_TARGET_DATA)) != 0
@@ -9098,6 +9102,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  while (TREE_CODE (decl) == COMPONENT_REF)
{
  decl = TREE_OPERAND (decl, 0);
+ component_ref_p = true;
  if (((TREE_CODE (decl) == MEM_REF
&& integer_zerop (TREE_OPERAND (

Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2021-01-19 Thread Chung-Lin Tang




On 2021/1/16 5:45 下午, Jakub Jelinek wrote:

+/* Unified reference count for structure element siblings, this is used
+   when REFCOUNT_STRUCTELEM_FIRST_P(k->refcount) == true, the first sibling
+   in a structure element sibling list item sequence.  */
+uintptr_t structelem_refcount;
+
+/* When REFCOUNT_STRUCTELEM_P (k->refcount) == true, this field points


REFCOUNT_STRUCTELEM_P (k->refcount) is true even for
REFCOUNT_STRUCTELEM_FIRST_P(k->refcount), so shouldn't the description say
that structelem_refcount_ptr is only used if
REFCOUNT_STRUCTELEM_P (k->refcount) && !REFCOUNT_STRUCTELEM_FIRST_P 
(k->refcount)
?


Sure, I'll revise the comments a bit.


+   into the (above) structelem_refcount field of the _FIRST splay_tree_key,
+   the first key in the created sequence. All structure element siblings
+   share a single refcount in this manner. Since these two fields won't be
+   used at the same time, they are stashed in a union.  */
+uintptr_t *structelem_refcount_ptr;
+  };
struct splay_tree_aux *aux;
  };
  
  /* The comparison function.  */


Anyway, most of the patch looks good, but I'd like to understand the
rationale for choosing a htab over what I've been trying to suggest, which
was essentially instead of incrementing or decrementing refcounts push them
into a vector for later incrementing/decrementing, then qsort the vector
(by the pointers to refcounts) and increment what the elements point to unless
the same address has been incremented/decremented already.

Jakub


Essentially the requirement is to increment/decrement a refcount only once per 
construct,
so using a pointer-set (implemented by htab_t here) to track the processing 
status
seemed to be more intuitive in code, and probably faster than sorting a vector 
I think
(at least in most cases).

Chung-Lin


Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2021-01-13 Thread Chung-Lin Tang

Ping x2.

Hi Jakub, would like this part of OpenMP 5.0 to be considered for GCC 11.

Thanks,
Chung-Lin

On 2020/12/14 6:32 PM, Chung-Lin Tang wrote:

Ping.

On 2020/12/4 10:15 PM, Chung-Lin Tang wrote:

Hi Jakub,
this is a new version of the structure element mapping patch for OpenMP 5.0 
requirement
changes.

This one uses the approach you've outlined in your concept patch [1], basically 
to
use more special REFCOUNT_* values to mark them, and link following structure 
element
splay_tree_keys back to the first key's refcount.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557622.html

Implementation notes of the attached patch:

(1) This patch solves the 5.0 requirements of "not already 
incremented/decremented
because of the effect of a map clause on the construct" by pulling in 
libgomp/hashtab.h
and using htab_t as a pointer set. A "htab_t *refcount_set" is added in 
map/unmap
routines to track the processing status of the uintptr_t* addresses of refcount
fields in splay_tree_keys.

    * Currently this patch is using the same htab_create/htab_free routines 
like in task.c.
  I toyed with creating a 'htab_alloca' macro (allocating a fixed size 
htab) to speed
  things further, but decided to play it safer for the current patch.

(2) Because of the use of pointer-to-refcounts as the basis, and structure 
element
siblings all share a same refcount, uniform increment/decrement without 
repeating is
also naturally achieved.

(3) Because of the need to remove whole structure element sibling sequences out 
of
context, it appears we need to mark the first/last of such a sequence. You'll 
see that
the special REFCOUNT_* values have been expanded a bit more than your concept 
patch
(at some point we should think about stop abusing it and add a proper flags 
word)

(4) The new increment/decrement routines combine most of the new refcount_set 
lookup
code with the refcount adjusting. For the decrement routine, "copy" and 
"removal" are
now separate return values, since for structure element sequences, even when 
signalling
"removal" you may still need to finish the "copy" work of following 
target_var_descs.

(5) There are some re-organizing changes to oacc-parallel.c and oacc-mem.c, but 
most
of the code that matters is in target.c.

(6) New testcases have been added to reflect the cases discussed on omp-lang 
list.

This patch has been tested for libgomp with no regressions on x86_64-linux with
nvptx offloading. Since I submitted the first "v1" patch long ago, is this okay 
to be
considered as committable now after approval?

Thanks,
Chung-Lin

2020-12-04  Chung-Lin Tang  

 libgomp/
 * hashtab.h (htab_clear): New function with initialization code
 factored out from...
 (htab_create): ...here, adjust to use htab_clear function.

 * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
 special refcount values, add comments.
 (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
 (REFCOUNT_LINK): Likewise.
 (REFCOUNT_STRUCTELEM): New special refcount range for structure
 element siblings.
 (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
 sibling maps.
 (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
 (REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
 (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
 (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
 (struct splay_tree_key_s): Add structelem_refcount and
 structelem_refcount_ptr fields into a union with dynamic_refcount.
 Add comments.
 (gomp_map_vars): Delete declaration.
 (gomp_map_vars_async): Likewise.
 (gomp_unmap_vars): Likewise.
 (gomp_unmap_vars_async): Likewise.
 (goacc_map_vars): New declaration.
 (goacc_unmap_vars): Likewise.

 * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
 (goacc_enter_datum): Likewise.
 (goacc_enter_data_internal): Likewise.
 * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
 and goacc_unmap_vars.
 (GOACC_data_start): Adjust to use goacc_map_vars.
 (GOACC_data_end): Adjust to use goacc_unmap_vars.

 * target.c (hash_entry_type): New typedef.
 (htab_alloc): New function hook for hashtab.h.
 (htab_free): Likewise.
 (htab_hash): Likewise.
 (htab_eq): Likewise.
 (hashtab.h): Add file include.
 (gomp_increment_refcount): New function.
 (gomp_decrement_refcount): Likewise.
 (gomp_map_vars_existing): Add refcount_set parameter, adjust to use
 gomp_increment_refcount.
 (gomp_map_fields_existing): Add refcount_set parameter, adjust calls
 to gomp_map_vars_existing.

 (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
 variable to guard OpenMP specific paths, adjust calls to
 gomp_map_vars_existing, add structure element sibli

  1   2   3   4   5   >