[PATCH, OpenACC 2.7, v2] Implement reductions for arrays and structs
Hi Thomas, This is v2 of the C/C++/middle-end parts of array/struct support for OpenACC reductions. The main changes are much fixed support for sub-arrays, and some new testcases. Tested on mainline using x86_64 host and nvptx/amdgcn offloading. Will backport to upcoming omp/devel/gcc-14 branch after approved for mainline. Thanks, Chung-Lin 2024-06-06 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * c-typeck.cc (c_oacc_reduction_defined_type_p): New function. (c_oacc_reduction_code_name): Likewise. (c_finish_omp_clauses): Handle OpenACC cases using new functions. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * semantics.cc (cp_oacc_reduction_defined_type_p): New function. (cp_oacc_reduction_code_name): Likewise. (finish_omp_reduction_clause): Handle OpenACC cases using new functions. gcc/ChangeLog: * config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (gcn_goacc_reduction_setup): Likewise. (gcn_goacc_reduction_init): Likewise. (gcn_goacc_reduction_fini): Likewise. (gcn_goacc_reduction_teardown): Likewise. * config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate V2SI shuffle using vec_extract op. (nvptx_get_shared_red_addr): Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. (nvptx_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (nvptx_goacc_reduction_setup): Likewise. (nvptx_goacc_reduction_init): Likewise. (nvptx_goacc_reduction_fini): Likewise. (nvptx_goacc_reduction_teardown): Likewise. * gimplify.cc (gimplify_scan_omp_clauses): Sanity checking for supported array reduction cases. (gimplify_adjust_omp_clauses): Peel away array MEM_REF for decl lookup. * omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type building to use decl type, rather than generic ptr_type_node. (omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op construction. (lower_rec_input_clauses): Set OMP_CLAUSE_REDUCTION_PRIVATE_EXPR. (oacc_array_reduction_bias): New function. (lower_oacc_reductions): Add code to teardown/recover array access MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements. Use OMP_CLAUSE_REDUCTION_PRIVATE_EXPR as reduction private copy if set. Handle array reductions using new oacc_array_reduction_bias function. Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. * omp-oacc-neuter-broadcast.cc (worker_single_copy): Add 'hash_set *array_reduction_base_vars' parameter. Add xxx. (neuter_worker_single): Add 'hash_set *array_reduction_base_vars' parameter. Adjust recursive calls to self and worker_single_copy. (oacc_do_neutering): Add 'hash_set *array_reduction_base_vars' parameter. Adjust call to neuter_worker_single. (execute_omp_oacc_neuter_broadcast): Add local 'hash_set array_reduction_base_vars' declaration. Collect MEM_REF base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add '_reduction_base_vars' argument to call of oacc_do_neutering. * omp-offload.cc (default_goacc_reduction): Add unshare_expr. * tree.cc (omp_clause_num_ops): Increase OMP_CLAUSE_REDUCTION ops to 6. * tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_EXPR): New macro. gcc/testsuite/ChangeLog: * c-c++-common/goacc/reduction-9.c: New test. * c-c++-common/goacc/reduction-10.c: New test. * c-c++-common/goacc/reduction-11.c: New test. * c-c++-common/goacc/reduction-12.c: New test. * c-c++-common/goacc/reduction-13.c: New test. * c-c++-common/goacc/reduction-14.c: New test. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/reduction.h (check_reduction_array_xx): New macro. (operator_apply): Likewise. (check_reduction_array_op): Likewise. (check_reduction_arraysec_op): Likewise. (function_apply): Likewise. (check_reduction_array_macro): Likewise. (check_reduction_arraysec_macro): Likewise. (check_reduction_xxx_xx_all): Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/reduction-arrays-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/reduction-arrays-3.c: New test. * testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c: New test. diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 2d9e9c0969f..61991a218f8 100644
Re: [PATCH, OpenACC 2.7, v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters
On 2024/4/12 3:14 PM, Thomas Schwinge wrote: >> I have re-tested the patch *without* the gomp_increment/decrement_refcount >> changes, >> and have these regressions (just to demonstrate what is affected): >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> execution test >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> execution test >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> execution test >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 >> execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c >> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 >> execution test > ... are cases where we 'acc_map_data' something, and then invoke an > OpenACC compute constuct with a data clause for the same memory region... > >> Now, I have also re-tested your version (aka, just break early and return >> when k->refcount == REFCOUNT_ACC_MAP_DATA) >> And for the record, that also works (no regressions). >> >> However, I strongly suggest we use my version here where we adjust the >> dynamic_refcount > ..., and it's confusing to me why such an OpenACC compute constuct (which > is to use the structured reference counter) should then use the dynamic > reference counter, for 'acc_map_data'-mapped data? > >> simply because: *It is the whole point of this project item in OpenACC 2.7* >> >> The 2.7 spec articulated how increment/decrement interacts with >> acc_map_data/acc_unmap_data and this patch was supposed to make libgomp more >> conforming to it implementation-wise. >> (otherwise, no point in working on this at all, as there wasn't really >> anything behaviorally wrong about our implementation before) > That is, in my understanding, those 'gomp_increment_refcount' changes > don't affect the 'acc_map_data' reference counting, but instead, they > change the reference counting for OpenACC constructs that are originally > using structured reference counter to instead use the dynamic reference > counter. This doesn't seem conceptually right to me. (..., even if not > observable from the outside.) Okay, I've committed the attached patch, with the "early return upon k->refcount == REFCOUNT_ACC_MAP_DATA" in gomp_increment/decrement_refcount. If we continue to use k->refcount itself as the flag holder of map type, I guess we will not be able to directly determine whether it is a structured or dynamic adjustment at that point. Probably need a new field entirely. I think we don't really need to do that right now. Thanks, Chung-Lin From a7578a077ed8b64b94282aa55faf7037690abbc5 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Tue, 16 Apr 2024 09:03:21 + Subject: [PATCH] OpenACC 2.7: Adjust acc_map_data/acc_unmap_data interaction with reference counters This patch adjusts the implementation of acc_map_data/acc_unmap_data API library routines to more fit the description in the OpenACC 2.7 specification. Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA special value to mark acc_map_data-created mappings. Adjustment around mapping related code to respect OpenACC semantics are also added. libgomp/ChangeLog: * libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2). * oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA, initialize dynamic_refcount as 1. (acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA, (goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case. (goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. (goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case. * target.c (gomp_increment_refcount): Return early for REFCOUNT_ACC_M
[PATCH, OpenACC 2.7, v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters
Hi Thomas, On 2024/3/15 7:24 PM, Thomas Schwinge wrote: > Hi Chung-Lin! > > I realized: please add "PR libgomp/92840" to the Git commit log, as your > changes are directly a continuation of my earlier changes. Okay, I'll remember to do that. ... > - if (n->refcount != REFCOUNT_INFINITY) > + if (n->refcount != REFCOUNT_INFINITY > + && n->refcount != REFCOUNT_ACC_MAP_DATA) > n->refcount--; >n->dynamic_refcount--; > } > > + /* Mappings created by 'acc_map_data' may only be deleted by > + 'acc_unmap_data'. */ > + if (n->refcount == REFCOUNT_ACC_MAP_DATA > + && n->dynamic_refcount == 0) > +n->dynamic_refcount = 1; > + >if (n->refcount == 0) > { >bool copyout = (kind == GOMP_MAP_FROM > > ..., which really should have the same semantics? No strong opinion on > which of the two variants you now chose. My guess is that breaking off the REFCOUNT_ACC_MAP_DATA case separately will be lighter on any branch predictors (faster performing overall), so I will stick with my version here. >>> >>> It's not clear to me why you need this handling -- instead of just >>> handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is, >>> early 'return'? >>> >>> Per my understanding, this code is for OpenACC only exercised for >>> structured data regions, and it seems strange (unnecessary?) to adjust >>> the 'dynamic_refcount' for these for 'acc_map_data'-mapped data? Or am I >>> missing anything? >> >> No, that is not true. It goes through almost everything through >> gomp_map_vars_existing/_internal. >> This is what happens when you acc_create/acc_copyin on a mapping created by >> acc_map_data. > > But I don't understand what you foresee breaking with the following (on > top of your v2): > > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -476,14 +476,14 @@ gomp_free_device_memory (struct gomp_device_descr > *devicep, void *devptr) > static inline void > gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set) > { > - if (k == NULL || k->refcount == REFCOUNT_INFINITY) > + if (k == NULL > + || k->refcount == REFCOUNT_INFINITY > + || k->refcount == REFCOUNT_ACC_MAP_DATA) > return; > >uintptr_t *refcount_ptr = >refcount; > > - if (k->refcount == REFCOUNT_ACC_MAP_DATA) > -refcount_ptr = >dynamic_refcount; > - else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > + if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > refcount_ptr = >structelem_refcount; ... > Can you please show a test case? I have re-tested the patch *without* the gomp_increment/decrement_refcount changes, and have these regressions (just to demonstrate what is affected): +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test Now, I have also re-tested your version (aka, just break early and return when k->refcount == REFCOUNT_ACC_MAP_DATA) And for the record, that also works (no regressions). However, I strongly suggest we use my version here where we adjust the dynamic_refcount, simply because: *It is the whole point of this project item in OpenACC 2.7* The 2.7 spec articulated how increment/decrement interacts with acc_map_data/acc_unmap_data and this patch was supposed to make libgomp more conforming to it implementation-wise. (otherwise, no point in working on this at all, as there wasn't really anything behaviorally wrong about our implementation before) > I see we already have: > > if ((kinds[i] & 0xff) == GOMP_MAP_TO_PSET > && tgt->list_count == 0) > { > /* 'declare target'. */ > assert (n->refcount == REFCOUNT_INFINITY); > > I think I wanted to you to add: > > ---
Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis
Hi Richard, Thomas, On 2023/10/30 8:46 PM, Richard Biener wrote: >> >> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the >> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY' >> flag. >> >> The actual optimization then is done in this second patch. Chung-Lin >> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that. >> I don't have much experience with most of the following generic code, so >> would appreciate a helping hand, whether that conceptually makes sense as >> well as from the implementation point of view: First of all, I have removed all of the gimplify-stage scanning and setting of DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes to gimplify.cc now) I remember this code was an artifact of earlier attempts to allow struct-member pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed anyways. I think the omp_data_* member accesses when building child function side receiver_refs is blocking points-to analysis from working (didn't try digging deeper) Also during gimplify, VAR_DECLs appeared to be reused (at least in some cases) for map clause decl reference building, so hoping that the variables "happen to be" single-use and DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does appear to be a little risky. However, for firstprivate pointers processed during omp-low, it appears to be somewhat different. (see below description) > No, I don't think you can use that flag on non-default-defs, nor > preserve it on copying. So > it also doesn't nicely extend to DECLs as done by the patch. We > currently _only_ use it > for incoming parameters. When used on arbitrary code you can get to for > example > > ptr1(points-to-readony-memory) = >x; > ... access via ptr1 ... > ptr2 = >x; > ... access via ptr2 ... > > where both are your OMP regions differently constrained (the constrain is on > the > code in the region, _not_ on the actual protections of the pointed to > data, much like > for the fortran case). But now CSE comes along and happily replaces all ptr2 > with ptr2 in the second region and ... oops! Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in the second region"? That doesn't happen, because during omp-lower/expand, OMP target regions (which is all that this applies currently) is separated into different individual child functions. (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during omp-lower, when for firstprivate pointers (i.e. 'a' here) we set this bit when constructing the first load of this pointer) #pragma acc parallel copyin(readonly: a[:32]) copyout(r) { foo (a, a[8]); r = a[8]; } #pragma acc parallel copyin(readonly: a[:32]) copyout(r) { foo (a, a[12]); r = a[12]; } After omp-expand (before SSA): __attribute__((oacc parallel, omp target entrypoint, noclone)) void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i) { ... : D.2962 = .omp_data_i->D.2947; a.8 = D.2962; r.1 = (*a.8)[12]; foo (a.8, r.1); r.1 = (*a.8)[12]; D.2965 = .omp_data_i->r; *D.2965 = r.1; return; } __attribute__((oacc parallel, omp target entrypoint, noclone)) void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i) { ... : D.2968 = .omp_data_i->D.2939; a.4 = D.2968; r.0 = (*a.4)[8]; foo (a.4, r.0); r.0 = (*a.4)[8]; D.2971 = .omp_data_i->r; *D.2971 = r.0; return; } So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a default-def for an PARM_DECL, at least conceptually. (If offloading was structured significantly differently, say if child functions were separated much earlier before omp-lowering, than this readonly-modifier might possibly be a direct application of 'r' in the "fn spec" attribute) Other changes since first version of patch include: 1) update of C/C++ FE changes to new style in c-family/c-omp.cc 2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested 3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to verify removal of a MEM load, also as Thomas suggested. I have re-tested this patch using mainline, with no regressions. Is this okay for mainline? Thanks, Chung-Lin 2024-04-03 Chung-Lin Tang gcc/c-family/ChangeLog: * c-omp.cc (c_omp_address_inspector::expand_array_base): Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause. (c_omp_address_inspector::expand_component_selector): Likewise. gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_array_section): Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause. gcc/Change
Re: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends
Hi Thomas, Tobias, On 2023/10/26 6:43 PM, Thomas Schwinge wrote: > +++ b/gcc/tree.h > @@ -1813,6 +1813,14 @@ class auto_suppress_location_wrappers > #define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \ > (OMP_CLAUSE_SUBCODE_CHECK (NODE, > OMP_CLAUSE_MAP)->base.addressable_flag) > > +/* Nonzero if OpenACC 'readonly' modifier set, used for 'copyin'. */ > +#define OMP_CLAUSE_MAP_READONLY(NODE) \ > + TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)) > + > +/* Same as above, for use in OpenACC cache directives. */ > +#define OMP_CLAUSE__CACHE__READONLY(NODE) \ > + TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_)) I'm not sure if these special accessor functions are actually useful, or we should just directly use 'TREE_READONLY' instead? We're only using them in contexts where it's clear that the 'OMP_CLAUSE_SUBCODE_CHECK' is satisfied, for example. >>> I find directly using TREE_READONLY confusing. >> >> FWIW, I've changed to use TREE_NOTHROW instead, if it can give a better >> sense of safety :P > > I don't understand that, why not use 'TREE_READONLY'? > >> I think there's a misunderstanding here anyways: we are not relying on a >> DECL marked >> TREE_READONLY here. We merely need the OMP_CLAUSE_MAP to be marked as >> OMP_CLAUSE_MAP_READONLY == 1. > > Yes, I understand that. My question was why we don't just use > 'TREE_READONLY (c)', where 'c' is the > 'OMP_CLAUSE_MAP'/'OMP_CLAUSE__CACHE_' clause (not its decl), and avoid > the indirection through > '#define OMP_CLAUSE_MAP_READONLY'/'#define OMP_CLAUSE__CACHE__READONLY', > given that we're only using them in contexts where it's clear that the > 'OMP_CLAUSE_SUBCODE_CHECK' is satisfied. I don't have a strong > preference, though. After further re-testing using TREE_NOTHROW, I have reverted to using TREE_READONLY, because TREE_NOTHROW clashes with OMP_CLAUSE_RELEASE_DESCRIPTOR (which doesn't use the OMP_CLAUSE_MAP_* naming convention and is not documented in gcc/tree-core.h either, hmmm...) I have added the comment adjustments in gcc/tree-core.h for the new uses of TREE_READONLY/readonly_flag. We basically all use OMP_CLAUSE_SUBCODE_CHECK macros for OpenMP clause expressions exclusively, so I don't see a reason to diverge from that style (even when context is clear). > Either way, you still need to document this: > > | Also, for the new use for OMP clauses, update 'gcc/tree.h:TREE_READONLY', > | and in 'gcc/tree-core.h' for 'readonly_flag' the > | "table lists the uses of each of the above flags". Okay, done as mentioned above. > In addition to a few individual comments above and below, you've also not > yet responded to my requests re test cases. I have greatly expanded the test scan patterns to include parallel/kernels/serial/data/enter data, as well as non-readonly copyin clause together with readonly. Also added simple 'declare' tests, but there is not anything to scan in the 'tree-original' dump though. >> + tree nl = list; >> + bool readonly = false; >> + matching_parens parens; >> + if (parens.require_open (parser)) >> +{ >> + /* Turn on readonly modifier parsing for copyin clause. */ >> + if (c_kind == PRAGMA_OACC_CLAUSE_COPYIN) >> + { >> + c_token *token = c_parser_peek_token (parser); >> + if (token->type == CPP_NAME >> + && !strcmp (IDENTIFIER_POINTER (token->value), "readonly") >> + && c_parser_peek_2nd_token (parser)->type == CPP_COLON) >> + { >> + c_parser_consume_token (parser); >> + c_parser_consume_token (parser); >> + readonly = true; >> + } >> + } >> + location_t loc = c_parser_peek_token (parser)->location; > > I suppose 'loc' here now points to after the opening '(' or after the > 'readonly :'? This is different from what 'c_parser_omp_var_list_parens' > does, and indeed, 'c_parser_omp_variable_list' states that "CLAUSE_LOC is > the location of the clause", not the location of the variable-list? As > this, I suppose, may change diagnostics, please restore the original > behavior. (This appears to be different in the C++ front end, huh.) Thanks for catching this! Fixed. >> --- a/gcc/fortran/openmp.cc >> +++ b/gcc/fortran/openmp.cc >> @@ -1197,7 +1197,7 @@ omp_inv_mask::omp_inv_mask (const omp_mask ) : >> omp_mask (m) >> >> static bool >> gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op, >> - bool allow_common, bool allow_derived) >> + bool allow_common, bool allow_derived, bool readonly >> = false) >> { >>gfc_omp_namelist **head = NULL; >>if (gfc_match_omp_variable_list ("", list, allow_common, NULL, , >> true, >> @@ -1206,7 +1206,10 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, >> gfc_omp_map_op map_op, >> { >>gfc_omp_namelist *n; >>for (n = *head; n; n = n->next) >> -
[PATCH, OpenACC 2.7, v2] Adjust acc_map_data/acc_unmap_data interaction with reference counters
gt;> + else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) >> refcount_ptr = >structelem_refcount; >>else if (REFCOUNT_STRUCTELEM_P (k->refcount)) >> refcount_ptr = k->structelem_refcount_ptr; >> @@ -527,7 +529,9 @@ gomp_decrement_refcount (splay_tree_key k, htab_t >> *refcount_set, bool delete_p, >> >>uintptr_t *refcount_ptr = >refcount; >> >> - if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) >> + if (k->refcount == REFCOUNT_ACC_MAP_DATA) >> +refcount_ptr = >dynamic_refcount; >> + else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) >> refcount_ptr = >structelem_refcount; >>else if (REFCOUNT_STRUCTELEM_P (k->refcount)) >> refcount_ptr = k->structelem_refcount_ptr; >> @@ -560,6 +564,10 @@ gomp_decrement_refcount (splay_tree_key k, htab_t >> *refcount_set, bool delete_p, >>else if (*refcount_ptr > 0) >> *refcount_ptr -= 1; >> >> + /* Force back to 1 if this is an acc_map_data mapping. */ >> + if (k->refcount == REFCOUNT_ACC_MAP_DATA && *refcount_ptr == 0) >> +*refcount_ptr = 1; >> + >> end: >>if (*refcount_ptr == 0) >> { > > It's not clear to me why you need this handling -- instead of just > handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is, > early 'return'? > > Per my understanding, this code is for OpenACC only exercised for > structured data regions, and it seems strange (unnecessary?) to adjust > the 'dynamic_refcount' for these for 'acc_map_data'-mapped data? Or am I > missing anything? No, that is not true. It goes through almost everything through gomp_map_vars_existing/_internal. This is what happens when you acc_create/acc_copyin on a mapping created by acc_map_data. > Overall, your changes regress the > commit 3e888f94624294d2b9b34ebfee0916768e5d9c3f > "Add OpenACC 'acc_map_data' variant to > 'libgomp.oacc-c-c++-common/deep-copy-8.c'" > that I just pushed. I think you just need to handle > 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' in > 'libgomp/oacc-mem.c:goacc_enter_data_internal', 'if (n && struct_p)'? > Please verify. Fixed by adding another '&& n->refcount != REFCOUNT_ACC_MAP_DATA' check in goacc_enter_data_internal. > But please also to the "Minimal OpenACC variant corresponding to PR96668" > code in 'libgomp/oacc-mem.c:goacc_enter_data_internal' add a safeguard > that we're not running into 'REFCOUNT_ACC_MAP_DATA' there. I think > that's currently not (reasonably easily) possible, given that > 'acc_map_data' isn't available in OpenACC/Fortran, but it'll be available > later, and then I'd rather have an 'assert' trigger there, instead of > random behavior. (I'm not asking you to write a mixed OpenACC/Fortran > plus C test case for that scenario -- if feasible at all.) I am not really sure what you want me to do here, but REFCOUNT_ACC_MAP_DATA mappings are all created through a single GOMP_MAP_ALLOC kind. The complex stuff of MAP_STRUCT, MAP_TO_PSET, etc. should all be not related here (I presume even if Fortran eventually gets acc_map_data, it would be the compiler side which should take care of passing the raw data-pointer/array-size to the acc_map_data routine) I have re-tested this on x86_64-linux + nvptx. Please see if this is okay for committing to mainline. Thanks, Chung-Lin 2024-03-04 Chung-Lin Tang libgomp/ChangeLog: * libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2). * oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA, initialize dynamic_refcount as 1. (acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA, remove TODO comments. Add assert of 'n->dynamic_refcount >= 1' and comments. (goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case. (goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. (goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case. * target.c (gomp_increment_refcount): Add REFCOUNT_ACC_MAP_DATA case. (gomp_decrement_refcount): Add REFCOUNT_ACC_MAP_DATA case, force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. * testsuite/libgomp.oacc-c-c++-common/lib-96.c: New testcase. * testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust testcase error output scan test. diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index f98cccd8b66..089393846d1 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1163,6 +1163,8 @@ struct target_mem_desc; /* Special value for refcount - tgt_offset contains target address of t
[PATCH, OpenACC 2.7] struct/array reductions for Fortran
Hi Tobias, Thomas, this patch adds support for Fortran to use arrays and struct(record) types in OpenACC reductions. There is still some shortcomings in the current state, mainly that only explicit-shaped arrays can be used (like its C counterpart). Anything else is currently a bit more complicated in the middle-end, since the existing reduction code creates an "init-op" (literal of initial values) which can't be done when say TYPE_MAX_VALUE (TYPE_DOMAIN (array_type)) is not a tree constant. I think we'll be on the hook to solve this later, but I think the current state is okay to submit. Tested without regressions on mainline (on top of first struct/array reduction patch[1]) Thanks, Chung-Lin [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html 2024-02-08 Chung-Lin Tang gcc/fortran/ChangeLog: * openmp.cc (oacc_reduction_defined_type_p): New function. (resolve_omp_clauses): Adjust OpenACC array reduction error case. Use oacc_reduction_defined_type_p for OpenACC. * trans-openmp.cc (gfc_trans_omp_array_reduction_or_udr): Add 'bool openacc' parameter, adjust part of function to be !openacc only. (gfc_trans_omp_reduction_list): Add 'bool openacc' parameter, pass to calls to gfc_trans_omp_array_reduction_or_udr. (gfc_trans_omp_clauses): Add 'openacc' argument to calls to gfc_trans_omp_reduction_list. (gfc_trans_omp_do): Pass 'op == EXEC_OACC_LOOP' as 'bool openacc' parameter in call to gfc_trans_omp_clauses. gcc/ChangeLog: * omp-low.cc (omp_reduction_init_op): Add checking if reduced array has constant bounds. (lower_oacc_reductions): Add handling of error_mark_node. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/array-reduction.f90: Adjust testcase. * gfortran.dg/goacc/reduction.f95: Likewise. libgomp/ChangeLog: * libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90: New testcase. * libgomp/testsuite/libgomp.oacc-fortran/reduction-10.f90: Likewise. * libgomp/testsuite/libgomp.oacc-fortran/reduction-11.f90: Likewise. * libgomp/testsuite/libgomp.oacc-fortran/reduction-12.f90: Likewise. * libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90: Likewise. diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 0af80d54fad..4bba9e666d6 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -7047,6 +7047,72 @@ oacc_is_loop (gfc_code *code) || code->op == EXEC_OACC_LOOP; } +static bool +oacc_reduction_defined_type_p (enum gfc_omp_reduction_op rop, gfc_typespec *ts) +{ + if (rop == OMP_REDUCTION_USER || rop == OMP_REDUCTION_NONE) +return false; + + if (ts->type == BT_INTEGER) +switch (rop) + { + case OMP_REDUCTION_AND: + case OMP_REDUCTION_OR: + case OMP_REDUCTION_EQV: + case OMP_REDUCTION_NEQV: + return false; + default: + return true; + } + + if (ts->type == BT_LOGICAL) +switch (rop) + { + case OMP_REDUCTION_AND: + case OMP_REDUCTION_OR: + case OMP_REDUCTION_EQV: + case OMP_REDUCTION_NEQV: + return true; + default: + return false; + } + + if (ts->type == BT_REAL || ts->type == BT_COMPLEX) +switch (rop) + { + case OMP_REDUCTION_PLUS: + case OMP_REDUCTION_TIMES: + case OMP_REDUCTION_MINUS: + return true; + + case OMP_REDUCTION_AND: + case OMP_REDUCTION_OR: + case OMP_REDUCTION_EQV: + case OMP_REDUCTION_NEQV: + return false; + + case OMP_REDUCTION_MAX: + case OMP_REDUCTION_MIN: + return ts->type != BT_COMPLEX; + case OMP_REDUCTION_IAND: + case OMP_REDUCTION_IOR: + case OMP_REDUCTION_IEOR: + return false; + default: + gcc_unreachable (); + } + + if (ts->type == BT_DERIVED) +{ + for (gfc_component *p = ts->u.derived->components; p; p = p->next) + if (!oacc_reduction_defined_type_p (rop, >ts)) + return false; + return true; +} + + return false; +} + static void resolve_scalar_int_expr (gfc_expr *expr, const char *clause) { @@ -8137,13 +8203,15 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, else n->sym->mark = 1; - /* OpenACC does not support reductions on arrays. */ - if (n->sym->as) + /* OpenACC current only supports array reductions on explicit-shape +arrays. */ + if ((n->sym->as && n->sym->as->type != AS_EXPLICIT) + || n->sym->attr.codimension) gfc_error ("Array %qs is not permitted in reduction at %L", n->sym->name, >where); } } - + for (n = omp_clauses->lists[OMP_LIST_TO]; n; n = n->next) n->sym->mark = 0; for (n = omp_clauses->lists[OMP_
[committed] MAINTAINERS: Update my email address
Updated my email address. Thanks, Chung-Lin From ffeab69e1ffc0405da3a9222c7b9f7a000252702 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 25 Jan 2024 18:20:43 + Subject: [PATCH] MAINTAINERS: Update my work email address * MAINTAINERS: Update my work email address. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 7d3b78d276e..8b11ddbc069 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -99,7 +99,7 @@ moxie portAnthony Green msp430 portNick Clifton nds32 port Chung-Ju Wu nds32 port Shiva Chen -nios2 port Chung-Lin Tang +nios2 port Chung-Lin Tang nios2 port Sandra Loosemore nvptx port Tom de Vries nvptx port Thomas Schwinge -- 2.34.1
[PATCH, OpenACC 2.7] Implement reductions for arrays and structs
Hi Thomas, Andrew, this patch implements reductions for arrays and structs for OpenACC. Following the pattern for OpenACC reductions, this is mostly in the respective NVPTX/GCN backends' *_goacc_reduction_setup/init/fini/teardown hooks, particularly in the fini part, and [nvptx/gcn]_reduction_update routines. The code is mostly similar between the two targets, with mostly the lack of vector mode handling in GCN. To Julian, there is a patch to the middle-end neutering, a hack actually, that detects SSA_NAMEs used in reduction array MEM_REFs, and avoids single->parallel copying (by moving those definitions before BUILT_IN_GOACC_SINGLE_COPY_START). This appears to work because reductions do their own initializing of the private copy. As we discussed in our internal calls, the real proper way is to create the private array in a more appropriate stage, but that is too long a shot for now. The changes here are needed at least for some -O0 cases (when under optimization, propagation of the private copies' local address eliminate the SSA_NAME and things actually just work in that case). So please bear with this hack. I believe the new added libgomp testcases should be fairly complete. Though note that one case of reduction of * for double arrays has been commented out for now, for there appears to be a (presumably) unrelated issue causing this case to fail (maybe has to do with the loop-based atomic form used by both NVPTX/GCN). Maybe should XFAIL instead of comment out. Will do this in next iteration. Thanks, Chung-Lin 2024-01-02 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * c-typeck.cc (c_oacc_reduction_defined_type_p): New function. (c_oacc_reduction_code_name): Likewise. (c_finish_omp_clauses): Handle OpenACC cases using new functions. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_reduction): Adjustments for OpenACC-specific cases. * semantics.cc (cp_oacc_reduction_defined_type_p): New function. (cp_oacc_reduction_code_name): Likewise. (finish_omp_reduction_clause): Handle OpenACC cases using new functions. gcc/ChangeLog: * config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (gcn_goacc_reduction_setup): Likewise. (gcn_goacc_reduction_init): Likewise. (gcn_goacc_reduction_fini): Likewise. (gcn_goacc_reduction_teardown): Likewise. * config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate V2SI shuffle using vec_extract op. (nvptx_get_shared_red_addr): Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. (nvptx_reduction_update): Additions for handling ARRAY_TYPE and RECORD_TYPE reductions. (nvptx_goacc_reduction_setup): Likewise. (nvptx_goacc_reduction_init): Likewise. (nvptx_goacc_reduction_fini): Likewise. (nvptx_goacc_reduction_teardown): Likewise. * omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type building to use decl type, rather than generic ptr_type_node. (omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op construction. (lower_oacc_reductions): Add code to teardown/recover array access MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements. Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT instead of machine mode based. * omp-oacc-neuter-broadcast.cc (worker_single_copy): Add 'hash_set *array_reduction_base_vars' parameter. Add xxx. (neuter_worker_single): Add 'hash_set *array_reduction_base_vars' parameter. Adjust recursive calls to self and worker_single_copy. (oacc_do_neutering): Add 'hash_set *array_reduction_base_vars' parameter. Adjust call to neuter_worker_single. (execute_omp_oacc_neuter_broadcast): Add local 'hash_set array_reduction_base_vars' declaration. Collect MEM_REF base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add '_reduction_base_vars' argument to call of oacc_do_neutering. * omp-offload.cc (default_goacc_reduction): Add unshare_expr. gcc/testsuite/ChangeLog: * c-c++-common/goacc/reduction-9.c: New test. * c-c++-common/goacc/reduction-10.c: New test. * c-c++-common/goacc/reduction-11.c: New test. * c-c++-common/goacc/reduction-12.c: New test. * c-c++-common/goacc/reduction-13.c: New test. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/reduction.h (check_reduction_array_xx): New macro. (operator_apply): Likewise. (check_reduction_array_op): Likewise. (check_reduction_arraysec_op): Likewise. (function_ap
[PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends
Hi Thomas, Tobias, here's the updated v2 of the readonly modifier front-end patch. On 2023/7/20 11:08 PM, Tobias Burnus wrote: >>> +++ b/gcc/c/c-parser.cc >>> @@ -14059,7 +14059,8 @@ c_parser_omp_variable_list (c_parser *parser, >>> >>> static tree >>> c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code kind, >>> - tree list, bool allow_deref = false) >>> + tree list, bool allow_deref = false, >>> + bool *readonly = NULL) >>> ... >> Instead of doing this in 'c_parser_omp_var_list_parens', I think it's >> clearer to have this special 'readonly :' parsing logic in the two places >> where it's used. > I concur. The same issue also occurred for OpenMP's > c_parser_omp_clause_to, and c_parser_omp_clause_from and the 'present' > modifier. For it, I created a combined function but the main reason for > that is that OpenMP also permits more modifiers (like 'iterators'), > which would cause more duplication of code ('iterator' is not yet > supported). > > For something as simple to parse as this modifier, I would just do it at > the two places – as Thomas suggested. Okay, I've changed the C/C++ parser parts to have the parsing logic directly added. >>> +++ b/gcc/fortran/gfortran.h >>> @@ -1360,7 +1360,11 @@ typedef struct gfc_omp_namelist >>> { >>> gfc_omp_reduction_op reduction_op; >>> gfc_omp_depend_doacross_op depend_doacross_op; >>> - gfc_omp_map_op map_op; >>> + struct >>> +{ >>> + ENUM_BITFIELD (gfc_omp_map_op) map_op:8; >>> + bool readonly; >>> +}; >>> gfc_expr *align; >>> struct >>>{ >> [...] Thus, the above looks good to me. > I concur but I wonder whether it would be cleaner to name the struct; > this makes it also more obvious what belongs together in the union. > > Namely, naming the struct 'map' and then changing the 45 users from > 'u.map_op' to 'u.map.op' and the new 'u.readonly' to 'u.map.readonly'. – > this seems to be cleaner. I've adjusted 'u.map' to be a named struct now, and updated the references. >> + if (gfc_match ("readonly :") == MATCH_YES) >> I note this one does not have a space after ':' in 'gfc_match', but the >> one above in 'gfc_match_omp_clauses' does. I don't know off-hand if that >> makes a difference in parsing -- probably not, as all of >> 'gcc/fortran/openmp.cc' generally doesn't seem to be very consistent >> about these two variants? > It *does* make a difference. And for obvious reasons. You don't want to > permit: > >!$acc kernels asnyccopy(a) > > but require at least one space (or comma) between "async" and "copy".. > (In fixed form Fortran, it would be fine - as would be "!$acc k e nelsasy nc > co p y(a)".) > > A " " matches zero or more whitespaces, but with gfc_match_space you can find > out > whether there was whitespace or not. Okay, made sure both are 'gfc_match ("readonly : ")'. Thanks for catching that, didn't realize that space was significant. >>> +++ b/gcc/tree.h >>> @@ -1813,6 +1813,14 @@ class auto_suppress_location_wrappers >>> #define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \ >>> (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.addressable_flag) >>> >>> +/* Nonzero if OpenACC 'readonly' modifier set, used for 'copyin'. */ >>> +#define OMP_CLAUSE_MAP_READONLY(NODE) \ >>> + TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)) >>> + >>> +/* Same as above, for use in OpenACC cache directives. */ >>> +#define OMP_CLAUSE__CACHE__READONLY(NODE) \ >>> + TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_)) >> I'm not sure if these special accessor functions are actually useful, or >> we should just directly use 'TREE_READONLY' instead? We're only using >> them in contexts where it's clear that the 'OMP_CLAUSE_SUBCODE_CHECK' is >> satisfied, for example. > I find directly using TREE_READONLY confusing. FWIW, I've changed to use TREE_NOTHROW instead, if it can give a better sense of safety :P I think there's a misunderstanding here anyways: we are not relying on a DECL marked TREE_READONLY here. We merely need the OMP_CLAUSE_MAP to be marked as OMP_CLAUSE_MAP_READONLY == 1. The other points-to patch then (also in front-ends) take the OMP_CLAUSE_MAP_READONLY to mark the clauses of "base-pointers of array-sections" as OMP_CLAUSE_MAP_POINTS_TO_READONLY, and later this gra
[PATCH, OpenACC 2.7, v2] Implement default clause support for data constructs
Hi Thomas, this is v2 of the patch for implementing the OpenACC 2.7 addition of default(none|present) support for data constructs. Instead of propagating an additional 'oacc_default_kind' for OpenACC, this patch does it in a more complete way: it directly propagates the gimplify_omp_ctx* pointer of the inner most context where we found a default-clause. This supports displaying the location/type of OpenACC construct where the default-clause is in the error messages. The testcases also have the multiple nested data construct testing added, where we can now have messages referring precisely to the exact innermost default clause that was active at that program point. Note, I got rid of the dummy OMP_CLAUSE_DEFAULT creation in this version, since it seemed not really needed. Re-tested on master on powerpc64le-linux/nvptx. Okay to commit? Thanks, Chung-Lin 2023-08-01 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT. gcc/cp/ChangeLog: * parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT. gcc/fortran/ChangeLog: * openmp.cc (OACC_DATA_CLAUSES): Add OMP_CLAUSE_DEFAULT. gcc/ChangeLog: * gimplify.cc (struct gimplify_omp_ctx): Add oacc_default_clause_ctx field. (new_omp_context): Initialize oacc_default_clause_ctx field. (oacc_region_type_name): New function. (oacc_default_clause): Lookup current default_kind value from ctx->oacc_default_clause_ctx, adjust default(none) error and inform message dumping. (gimplify_scan_omp_clauses): Upon OMP_CLAUSE_DEFAULT case, set ctx->oacc_default_clause_ctx to current context. gcc/testsuite/ChangeLog: * c-c++-common/goacc/default-3.c: Adjust testcase. * c-c++-common/goacc/default-5.c: Adjust testcase. * gfortran.dg/goacc/default-3.f95: Adjust testcase. * gfortran.dg/goacc/default-5.f: Adjust testcase.diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 24a6eb6e459..974f0132787 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -18196,6 +18196,7 @@ c_parser_oacc_cache (location_t loc, c_parser *parser) | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE) \ + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NO_CREATE) \ diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index d7ef5b34d42..bc59fbeac20 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -45860,6 +45860,7 @@ cp_parser_oacc_cache (cp_parser *parser, cp_token *pragma_tok) | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE) \ + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DETACH) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF) \ diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 2952cd300ac..c37f843ec3b 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -3802,7 +3802,8 @@ error: #define OACC_DATA_CLAUSES \ (omp_mask (OMP_CLAUSE_IF) | OMP_CLAUSE_DEVICEPTR | OMP_CLAUSE_COPY\ | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT | OMP_CLAUSE_CREATE \ - | OMP_CLAUSE_NO_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_ATTACH) + | OMP_CLAUSE_NO_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_ATTACH \ + | OMP_CLAUSE_DEFAULT) #define OACC_LOOP_CLAUSES \ (omp_mask (OMP_CLAUSE_COLLAPSE) | OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER \ | OMP_CLAUSE_VECTOR | OMP_CLAUSE_SEQ | OMP_CLAUSE_INDEPENDENT \ diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 320920ed74c..ec0ccc67da8 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -225,6 +225,7 @@ struct gimplify_omp_ctx vec loop_iter_var; location_t location; enum omp_clause_default_kind default_kind; + struct gimplify_omp_ctx *oacc_default_clause_ctx; enum omp_region_type region_type; enum tree_code code; bool combined_loop; @@ -459,6 +460,10 @@ new_omp_context (enum omp_region_type region_type) c->default_kind = OMP_CLAUSE_DEFAULT_SHARED; else c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED; + if (gimplify_omp_ctxp) +c->oacc_default_clause_ctx = gimplify_omp_ctxp-
[PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis
On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote: > As we discussed earlier, the work for actually linking this to middle-end > points-to analysis is a somewhat non-trivial issue. This first patch allows > the language feature to be used in OpenACC directives first (with no effect > for now). > The middle-end changes are probably going to be a later patch. This second patch tries to link the readonly modifier to points-to analysis. There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in the alias oracle routines in tree-ssa-alias.cc, so basically what this patch does is try to make the variables holding the array section base pointers to have this flag set. There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on the associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set. Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the tmp vars carrying these receiver references on the offloaded side. These eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY. This still doesn't always work as expected in terms of optimization: struct pointer fields and Fortran arrays (kind of like C structs) which have several accesses to create the pointer access on the receive/offloaded side, and SRA appears to not work on these sequences, so gets in the way of much redundancy elimination. Currently have one testcase where we can demonstrate 'readonly' can avoid a clobber by function call. Tested on powerpc64le-linux/nvptx. Note this patch is create a-top of the front-end patch. (will respond to the other front-end patch comments later) Thanks, Chung-Lin 2023-07-25 Chung-Lin Tang gcc/c/ChangeLog: * c-typeck.cc (handle_omp_array_sections): Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause. gcc/cp/ChangeLog: * semantics.cc (handle_omp_array_sections): Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause. gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_array_section): Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause. gcc/ChangeLog: * gimple-expr.cc (copy_var_decl): Copy DECL_POINTS_TO_READONLY for VAR_DECLs. * gimplify.cc (struct gimplify_omp_ctx): Add 'hash_set *pt_readonly_ptrs' field. (internal_get_tmp_var): Set DECL_POINTS_TO_READONLY/SSA_NAME_POINTS_TO_READONLY_MEMORY for new temp vars. (build_omp_struct_comp_nodes): Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause. (gimplify_scan_omp_clauses): Collect OMP_CLAUSE_MAP_POINTS_TO_READONLY to ctx->pt_readonly_ptrs. * omp-low.cc (lower_omp_target): Set DECL_POINTS_TO_READONLY for variables of receiver refs. * tree-pretty-print.cc (dump_omp_clause): Print OMP_CLAUSE_MAP_POINTS_TO_READONLY. (dump_generic_node): Print SSA_NAME_POINTS_TO_READONLY_MEMORY. * tree.h (DECL_POINTS_TO_READONLY): New macro. (OMP_CLAUSE_MAP_POINTS_TO_READONLY): New macro. gcc/testsuite/ChangeLog: * c-c++-common/goacc/readonly-1.c: Adjust testcase. * c-c++-common/goacc/readonly-2.c: New testcase. * gfortran.dg/goacc/readonly-1.f90: Adjust testcase. diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index 7cf411155c6..42591e4029a 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -14258,6 +14258,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH); else OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER); + if (OMP_CLAUSE_MAP_READONLY (c)) + OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1; OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c); if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER && !c_mark_addressable (t)) diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 8fb47fd179e..6ab467e1140 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -5872,6 +5872,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) } else OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER); + if (OMP_CLAUSE_MAP_READONLY (c)) + OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1; OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c); if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER && !cxx_mark_addressable (t)) diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 2253d559f9c..d7cd65af1bb 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -2524,6 +2524,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op, node3 = build_omp_clause (input_location, OMP_CLAUSE_MAP); OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind); OMP_CLAUSE_DECL (node3) = gfc_conv_descriptor_data_ge
Re: [PATCH, OpenACC 2.7] Implement default clause support for data constructs
Hi Thomas, On 2023/6/23 6:47 PM, Thomas Schwinge wrote: >> + >>ctx->clauses = *orig_list_p; >>gimplify_omp_ctxp = ctx; >> } > Instead of this, in 'gimplify_omp_workshare', before the > 'gimplify_scan_omp_clauses' call, do something like: > > if ((ort & ORT_ACC) > && !omp_find_clause (OMP_CLAUSES (expr), OMP_CLAUSE_DEFAULT)) > { > /* Determine effective 'default' clause for OpenACC compute > construct. */ > for (struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp; ctx; ctx = > ctx->outer_context) > { > if (ctx->region_type == ORT_ACC_DATA > && ctx->default_kind != OMP_CLAUSE_DEFAULT_SHARED) > { > [Append actual default clause on compute construct.] > break; > } > } > } > > That seems conceptually simpler to me? I'm not sure if this is conceptually simpler, but using 'oacc_default_kind' is definitely faster computationally :) However, as you mention below... > For the 'build_omp_clause', does using 'ctx->location' instead of > 'UNKNOWN_LOCATION' help diagnostics in any way? Like if we add in > 'gcc/gimplify.cc:oacc_default_clause', > 'if (ctx->default_kind == OMP_CLAUSE_DEFAULT_NONE)' another 'inform' to > point to the 'data' construct's 'default' clause? (But not sure if > that's easily done; otherwise don't.) Noticed that we will need to track the actually lexically enclosing OpenACC construct with the user set default-clause somewhere in 'ctx', in order to satisfy the current diagnostics in oacc_default_clause(). (the UNKNOWN_LOCATION for the internally created default-clause probably doesn't matter, that one is just for reminder in internal dumps, probably never plays role in user diagnostics) > Similar to the ones you've already got, please also add a few test cases > for nested 'default' clauses, like: > > #pragma acc data // no vs. 'default(none)' vs. 'default(present)' > { > #pragma acc data // no vs. same vs. different 'default' clause > { > #pragma acc data // no vs. same vs. different 'default' clause > { > #pragma acc parallel > > Similarly, test cases where 'default' on the compute construct overrides > 'default' of an outer 'data' construct. Okay, will add more testcases. Thanks, Chung-Lin
[PATCH, OpenACC 2.7, v2] Implement host_data must have use_device clause requirement
On 2023/6/16 5:13 PM, Thomas Schwinge wrote: > OK with one small change, please -- unless there's a reason for doing it > this way: > >> --- a/gcc/fortran/trans-openmp.cc >> +++ b/gcc/fortran/trans-openmp.cc >> @@ -4677,6 +4677,12 @@ gfc_trans_oacc_construct (gfc_code *code) >> break; >>case EXEC_OACC_HOST_DATA: >> construct_code = OACC_HOST_DATA; >> + if (code->ext.omp_clauses->lists[OMP_LIST_USE_DEVICE] == NULL) >> + { >> + error_at (gfc_get_location (>loc), >> + "% construct requires % >> clause"); >> + return NULL_TREE; >> + } >> break; >>default: >> gcc_unreachable (); > The OpenMP "must contain at least one [...] clause" checks are done in > 'gcc/fortran/openmp.cc:resolve_omp_clauses'. For consistency (or, to let > 'gcc/fortran/trans-openmp.cc' continue to just deal with "directive > translation"), do similar for OpenACC 'host_data'? (..., and we later > accordingly adjust 'gcc/fortran/openmp.cc:gfc_match_oacc_update', too?) Hi Thomas, I've adjusted the Fortran implementation as you described. Yes, I agree this way more fits current Fortran FE conventions. I've re-tested the attached v2 patch, will commit later this week if no major objections. Thanks, Chung-Lin gcc/c/ChangeLog: * c-parser.cc (c_parser_oacc_host_data): Add checking requiring OpenACC host_data construct to have an use_device clause. gcc/cp/ChangeLog: * parser.cc (cp_parser_oacc_host_data): Add checking requiring OpenACC host_data construct to have an use_device clause. gcc/fortran/ChangeLog: * openmp.cc (resolve_omp_clauses): Add checking requiring OpenACC host_data construct to have an use_device clause. gcc/testsuite/ChangeLog: * c-c++-common/goacc/host_data-2.c: Adjust testcase. * gfortran.dg/goacc/host_data-error.f90: New testcase. * gfortran.dg/goacc/pr71704.f90: Adjust testcase.diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 24a6eb6e459..80920b31f83 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -18461,8 +18461,13 @@ c_parser_oacc_host_data (location_t loc, c_parser *parser, bool *if_p) tree stmt, clauses, block; clauses = c_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK, - "#pragma acc host_data"); - + "#pragma acc host_data", false); + if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR)) +{ + error_at (loc, "% construct requires % clause"); + return error_mark_node; +} + clauses = c_finish_omp_clauses (clauses, C_ORT_ACC); block = c_begin_omp_parallel (); add_stmt (c_parser_omp_structured_block (parser, if_p)); stmt = c_finish_oacc_host_data (loc, clauses, block); diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 5e2b5cba57e..beb5b632e5e 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -45895,8 +45895,15 @@ cp_parser_oacc_host_data (cp_parser *parser, cp_token *pragma_tok, bool *if_p) unsigned int save; clauses = cp_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK, - "#pragma acc host_data", pragma_tok); - + "#pragma acc host_data", pragma_tok, + false); + if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR)) +{ + error_at (pragma_tok->location, + "% construct requires % clause"); + return error_mark_node; +} + clauses = finish_omp_clauses (clauses, C_ORT_ACC); block = begin_omp_parallel (); save = cp_parser_begin_omp_structured_block (parser); cp_parser_statement (parser, NULL_TREE, false, if_p); diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 8efc4b3ecfa..f7af02845de 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -8764,6 +8764,12 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, "% clause", _clauses->detach->where); } + if (openacc + && code->op == EXEC_OACC_HOST_DATA + && omp_clauses->lists[OMP_LIST_USE_DEVICE] == NULL) +gfc_error ("% construct at %L requires % clause", + >loc); + if (omp_clauses->assume) gfc_resolve_omp_assumptions (omp_clauses->assume); } diff --git a/gcc/testsuite/c-c++-common/goacc/host_data-2.c b/gcc/testsuite/c-c++-common/goacc/host_data-2.c index b3093e575ff..862a764eb3a 100644 --- a/gcc/testsuite/c-c++-common/goacc/host_data-2.c +++ b/gcc/testsuite/c-c++-common/goacc/host_data-2.c @@ -8,7 +8,9 @@ void f (void) { int v2 = 3; -#pragma acc host_data copy(v2) /* { dg-error ".copy. is not valid for ..pragma acc host_data." } */ +#pragma acc host_data copy(v2) + /* { dg-error ".copy. is not valid for ..pragma acc host_data." "" { target *-*-* } .-1 } */ + /* { dg-error ".host_data. construct requires .use_device. clause" "" { target *-*-* } .-2 } */ ;
[PATCH, OpenACC 2.7] readonly modifier support in front-ends
Hi Thomas, this patch contains support for the 'readonly' modifier in copyin clauses and the cache directive. As we discussed earlier, the work for actually linking this to middle-end points-to analysis is a somewhat non-trivial issue. This first patch allows the language feature to be used in OpenACC directives first (with no effect for now). The middle-end changes are probably going to be a later patch. (Also CCing Tobias because of the Fortran bits) Tested on powerpc64le-linux with nvptx offloading. Is this okay for trunk? Thanks, Chung-Lin 2023-07-10 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_var_list_parens): Add 'bool *readonly = NULL' parameter, add readonly modifier parsing support. (c_parser_oacc_data_clause): Adjust c_parser_omp_var_list_parens call to turn on readonly modifier parsing for copyin clause, set OMP_CLAUSE_MAP_READONLY if readonly modifier found, update comments. (c_parser_oacc_cache): Adjust c_parser_omp_var_list_parens call to turn on readonly modifier parsing, set OMP_CLAUSE__CACHE__READONLY if readonly modifier found, update comments. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_var_list): Add 'bool *readonly = NULL' parameter, add readonly modifier parsing support. (cp_parser_oacc_data_clause): Adjust cp_parser_omp_var_list call to turn on readonly modifier parsing for copyin clause, set OMP_CLAUSE_MAP_READONLY if readonly modifier found, update comments. (cp_parser_oacc_cache): Adjust cp_parser_omp_var_list call to turn on readonly modifier parsing, set OMP_CLAUSE__CACHE__READONLY if readonly modifier found, update comments. gcc/fortran/ChangeLog: * gfortran.h (typedef struct gfc_omp_namelist): Adjust map_op as ENUM_BITFIELD field, add 'bool readonly' field. * openmp.cc (gfc_match_omp_map_clause): Add 'bool readonly = false' parameter, set n->u.readonly field. (gfc_match_omp_clauses): Add readonly modifier parsing for OpenACC copyin clause, adjust call to gfc_match_omp_map_clause. (gfc_match_oacc_cache): Add readonly modifier parsing for OpenACC cache directive, adjust call to gfc_match_omp_map_clause. * trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_CLAUSE_MAP_READONLY, OMP_CLAUSE__CACHE__READONLY to 1 when readonly is set. gcc/ChangeLog: * tree-pretty-print.cc (dump_omp_clause): Add support for printing OMP_CLAUSE_MAP_READONLY and OMP_CLAUSE__CACHE__READONLY. * tree.h (OMP_CLAUSE_MAP_READONLY): New macro. (OMP_CLAUSE__CACHE__READONLY): New macro. gcc/testsuite/ChangeLog: * c-c++-common/goacc/readonly-1.c: New test. * gfortran.dg/goacc/readonly-1.f90: New test. diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index d4b98d5d8b6..09e1e89d793 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -14059,7 +14059,8 @@ c_parser_omp_variable_list (c_parser *parser, static tree c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code kind, - tree list, bool allow_deref = false) + tree list, bool allow_deref = false, + bool *readonly = NULL) { /* The clauses location. */ location_t loc = c_parser_peek_token (parser)->location; @@ -14067,6 +14068,20 @@ c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code kind, matching_parens parens; if (parens.require_open (parser)) { + if (readonly != NULL) + { + c_token *token = c_parser_peek_token (parser); + if (token->type == CPP_NAME + && !strcmp (IDENTIFIER_POINTER (token->value), "readonly") + && c_parser_peek_2nd_token (parser)->type == CPP_COLON) + { + c_parser_consume_token (parser); + c_parser_consume_token (parser); + *readonly = true; + } + else + *readonly = false; + } list = c_parser_omp_variable_list (parser, loc, kind, list, allow_deref); parens.skip_until_found_close (parser); } @@ -14084,7 +14099,11 @@ c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code kind, OpenACC 2.6: no_create ( variable-list ) attach ( variable-list ) - detach ( variable-list ) */ + detach ( variable-list ) + + OpenACC 2.7: + copyin (readonly : variable-list ) + */ static tree c_parser_oacc_data_clause (c_parser *parser, pragma_omp_clause c_kind, @@ -14135,11 +14154,22 @@ c_parser_oacc_data_clause (c_parser *parser, pragma_omp_clause c_kind, default: gcc_unreachable (); } + + /* Turn on readonly modifier parsing for copyin clause. */ + bool readonly = false, *readonly_ptr = NULL; + if (c_kind == PRAGMA_OACC_CLAUSE_COPYIN) +readonly_ptr = + tree nl, c; -
[PATCH, OpenACC 2.7] Adjust acc_map_data/acc_unmap_data interaction with reference counters
Hi Thomas, This patch adjusts the implementation of acc_map_data/acc_unmap_data API library routines to more fit the description in the OpenACC 2.7 specification. Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA special value to mark acc_map_data-created mappings, and allow adjustment of dynamic_refcount of such mappings by other constructs. Enforcing of an initial value of 1 for such mappings, and only allowing acc_unmap_data to delete such mappings, is implemented as specified. Actually, there is no real change (or improvement) in behavior of the API (thus no new tests) I've looked at the related OpenACC spec issues, and it seems that this part of the 2.7 spec change is mostly a clarification (see no downside in current REFCOUNT_INFINITY based implementation either). But this patch does make the internals more close to the spec description. Tested without regressions using powerpc64le-linux/nvptx, okay for trunk? Thanks, Chung-Lin 2023-06-22 Chung-Lin Tang libgomp/ChangeLog: * libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2). * oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA, initialize dynamic_refcount as 1. (acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA, (goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case. (goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. * target.c (gomp_increment_refcount): Add REFCOUNT_ACC_MAP_DATA case. (gomp_decrement_refcount): Add REFCOUNT_ACC_MAP_DATA case, force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. * testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust testcase error output scan test. diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 4d2bfab4b71..fb8ef651dfb 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1166,6 +1166,8 @@ struct target_mem_desc; /* Special value for refcount - tgt_offset contains target address of the artificial pointer to "omp declare target link" object. */ #define REFCOUNT_LINK (REFCOUNT_SPECIAL | 1) +/* Special value for refcount - created through acc_map_data. */ +#define REFCOUNT_ACC_MAP_DATA (REFCOUNT_SPECIAL | 2) /* Special value for refcount - structure element sibling list items. All such key refounts have REFCOUNT_STRUCTELEM bits set, with _FLAG_FIRST diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index fe632740769..2a782ac22c1 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -411,7 +411,8 @@ acc_map_data (void *h, void *d, size_t s) assert (n->refcount == 1); assert (n->dynamic_refcount == 0); /* Special reference counting behavior. */ - n->refcount = REFCOUNT_INFINITY; + n->refcount = REFCOUNT_ACC_MAP_DATA; + n->dynamic_refcount = 1; if (profiling_p) { @@ -460,7 +461,7 @@ acc_unmap_data (void *h) the different 'REFCOUNT_INFINITY' cases, or simply separate 'REFCOUNT_INFINITY' values per different usage ('REFCOUNT_ACC_MAP_DATA' etc.)? */ - else if (n->refcount != REFCOUNT_INFINITY) + else if (n->refcount != REFCOUNT_ACC_MAP_DATA) { gomp_mutex_unlock (_dev->lock); gomp_fatal ("refusing to unmap block [%p,+%d] that has not been mapped" @@ -519,7 +520,8 @@ goacc_map_var_existing (struct gomp_device_descr *acc_dev, void *hostaddr, } assert (n->refcount != REFCOUNT_LINK); - if (n->refcount != REFCOUNT_INFINITY) + if (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA) n->refcount++; n->dynamic_refcount++; @@ -683,6 +685,7 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void *h, size_t s, assert (n->refcount != REFCOUNT_LINK); if (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA && n->refcount < n->dynamic_refcount) { gomp_mutex_unlock (_dev->lock); @@ -691,15 +694,27 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void *h, size_t s, if (finalize) { - if (n->refcount != REFCOUNT_INFINITY) + if (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA) n->refcount -= n->dynamic_refcount; - n->dynamic_refcount = 0; + + if (n->refcount == REFCOUNT_ACC_MAP_DATA) + /* Mappings created by acc_map_data are returned to initial + dynamic_refcount of 1. Can only be deleted by acc_unmap_data. */ + n->dynamic_refcount = 1; + else + n->dynamic_refcount = 0; } else if (n->dynamic_refcount) { - if (n->refcount != REFCOUNT_INFINITY) + if (n->refcount != REFCOUNT_INFINITY +
[PATCH, OpenACC 2.7] Implement self clause for compute constructs
Hi Thomas, This patch implements the compiler side for the 'self' clause for compute constructs: parallel, kernels, and serial. As you know, the actual "local device" device type for libgomp is not yet implemented, so the libgomp side is basically just a simple duplicate of what host-fallback is doing, though everything else should be completed by this patch. Tested on powerpc64le-linux/nvptx, x64_64-linux/amdgcn tests pending. Is this okay for trunk? Thanks, Chung-Lin 2023-06-13 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.cc (c_parser_oacc_compute_clause_self): New function. (c_parser_oacc_all_clauses): Add new 'bool compute_p = false' parameter, add parsing of self clause when compute_p is true. (OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF. (OACC_PARALLEL_CLAUSE_MASK): Likewise, (OACC_SERIAL_CLAUSE_MASK): Likewise. (c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to set compute_p argument to true. * c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case. gcc/cp/ChangeLog: * parser.cc (cp_parser_oacc_compute_clause_self): New function. (cp_parser_oacc_all_clauses): Add new 'bool compute_p = false' parameter, add parsing of self clause when compute_p is true. (OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF. (OACC_PARALLEL_CLAUSE_MASK): Likewise, (OACC_SERIAL_CLAUSE_MASK): Likewise. (cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to set compute_p argument to true. * pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case. * c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged with OMP_CLAUSE_IF case. gcc/fortran/ChangeLog: * gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field. * openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF. (gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF. (OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF. (OACC_KERNELS_CLAUSES): Likewise. (OACC_SERIAL_CLAUSES): Likewise. (resolve_omp_clauses): Add handling for omp_clauses->self_expr. * trans-openmp.cc (gfc_trans_omp_clauses): Add handling of clauses->self_expr and building of OMP_CLAUSE_SELF tree clause. (gfc_split_omp_clauses): Add handling of self_expr field copy. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case. (gimplify_adjust_omp_clauses): Likewise. * omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code, * omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum. * tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF case. (convert_local_omp_clauses): Likewise. * tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE_SELF_EXPR): New macro. gcc/testsuite/ChangeLog: * c-c++-common/goacc/self-clause-1.c: New test. * c-c++-common/goacc/self-clause-2.c: New test. * gfortran.dg/goacc/self.f95: New test. include/ChangeLog: * gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value. libgomp/ChangeLog: * oacc-parallel.c (GOACC_parallel_keyed): Add code to handle GOACC_FLAG_LOCAL_DEVICE case. * testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.From 449883981c8e1f707b47ff8f8dd70049b9ffda82 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Tue, 13 Jun 2023 08:44:31 -0700 Subject: [PATCH] OpenACC 2.7: Implement self clause for compute constructs This patch implements the 'self' clause for compute constructs: parallel, kernels, and serial. This clause conditionally uses the local device (the host mult-core CPU) as the executing device of the compute region. The actual implementation of the "local device" device type inside libgomp (presumably using pthreads) is still not yet completed, so the libgomp side is still implemented the exact same as host-fallback mode. (so as of now, it essentially behaves like the 'if' clause with the condition inverted) gcc/c/ChangeLog: * c-parser.cc (c_parser_oacc_compute_clause_self): New function. (c_parser_oacc_all_clauses): Add new 'bool compute_p = false' parameter, add parsing of self clause when compute_p is true. (OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF. (OACC_PARALLEL_CLAUSE_MASK): Likewise, (OACC_SERIAL_CLAUSE_MASK): Likewise. (c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to set compute_p argument to true. * c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case. gcc/cp/Chang
[PATCH, OpenACC 2.7] Implement default clause support for data constructs
Hi Thomas, this patch implements the OpenACC 2.7 addition of default(none|present) support for data constructs. Apart from adjusting the front-ends for allowed clauses masks (for acc data), mostly implemented in gimplify. Tested on powerpc64le-linux/nvptx, x86_64-linux/amdgcn tests in progress (expect no surprises). Is this okay for trunk? Thanks, Chung-Lin gcc/c/ChangeLog: * c-parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT. gcc/cp/ChangeLog: * parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT. gcc/fortran/ChangeLog: * openmp.cc (OACC_DATA_CLAUSES): Add OMP_CLAUSE_DEFAULT. gcc/ChangeLog: * gimplify.cc (struct gimplify_omp_ctx): Add oacc_data_default_kind field. (new_omp_context): Initialize oacc_data_default_kind field. (gimplify_scan_omp_clauses): Set oacc_data_default_kind for data constructs. Set ctx->default_kind for compute constructs from ctx->oacc_data_default_kind. gcc/testsuite/ChangeLog: * c-c++-common/goacc/default-3.c: Adjust testcase. * c-c++-common/goacc/default-5.c: Adjust testcase. * gfortran.dg/goacc/default-3.f95: Adjust testcase. * gfortran.dg/goacc/default-5.f: Adjust testcase. From 101305aee9b27c6df00d7c403e469bdf8d7f45a4 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Tue, 6 Jun 2023 03:46:29 -0700 Subject: [PATCH 2/2] OpenACC 2.7: default clause support for data constructs This patch implements the OpenACC 2.7 addition of default(none|present) support for data constructs. Now, specifying "default(none|present)" on a data construct turns on same default clause behavior for all enclosed compute constructs (which don't already themselves have a default clause). gcc/c/ChangeLog: * c-parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT. gcc/cp/ChangeLog: * parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT. gcc/fortran/ChangeLog: * openmp.cc (OACC_DATA_CLAUSES): Add OMP_CLAUSE_DEFAULT. gcc/ChangeLog: * gimplify.cc (struct gimplify_omp_ctx): Add oacc_data_default_kind field. (new_omp_context): Initialize oacc_data_default_kind field. (gimplify_scan_omp_clauses): Set oacc_data_default_kind for data constructs. Set ctx->default_kind for compute constructs from ctx->oacc_data_default_kind. gcc/testsuite/ChangeLog: * c-c++-common/goacc/default-3.c: Adjust testcase. * c-c++-common/goacc/default-5.c: Adjust testcase. * gfortran.dg/goacc/default-3.f95: Adjust testcase. * gfortran.dg/goacc/default-5.f: Adjust testcase. --- gcc/c/c-parser.cc | 1 + gcc/cp/parser.cc | 1 + gcc/fortran/openmp.cc | 3 ++- gcc/gimplify.cc | 20 +++ gcc/testsuite/c-c++-common/goacc/default-3.c | 15 +- gcc/testsuite/c-c++-common/goacc/default-5.c | 18 +++-- gcc/testsuite/gfortran.dg/goacc/default-3.f95 | 15 ++ gcc/testsuite/gfortran.dg/goacc/default-5.f | 17 ++-- 8 files changed, 84 insertions(+), 6 deletions(-) diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index b61aef8b1a2..645d28b320d 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -18133,6 +18133,7 @@ c_parser_oacc_cache (location_t loc, c_parser *parser) | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE) \ + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NO_CREATE) \ diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index dd7638f1c93..4b4df29a406 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -45759,6 +45759,7 @@ cp_parser_oacc_cache (cp_parser *parser, cp_token *pragma_tok) | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE) \ + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DETACH) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF) \ diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 4c30548567f..b785e71f20f 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -3645,7 +3645,8 @@
[PATCH, OpenACC 2.7] Implement host_data must have use_device clause requirement
Hi Thomas, this patch implements the OpenACC 2.7 change requiring the host_data construct to have at least one use_device clause. This patch started out with a simple check during gimplify (much smaller patch), but turned out that front-ends removed use_device clauses when they have error, and the gimplify check started to echo a "no use_device clause" message in such cases, which seem confusing for the user. So ended up adding the check in each front-end instead. Tested on powerpc64le-linux/nvptx, x86_64-linux/amdgcn tests in progress (expect no surprises). Is this okay for trunk? Thanks, Chung-Lin gcc/c/ChangeLog: * c-parser.cc (c_parser_oacc_host_data): Add checking requiring OpenACC host_data construct to have an use_device clause. gcc/cp/ChangeLog: * parser.cc (cp_parser_oacc_host_data): Add checking requiring OpenACC host_data construct to have an use_device clause. gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_oacc_construct): Add checking requiring OpenACC host_data construct to have an use_device clause. gcc/testsuite/ChangeLog: * c-c++-common/goacc/host_data-2.c: Adjust testcase. * gfortran.dg/goacc/host_data-error.f90: New testcase. * gfortran.dg/goacc/pr71704.f90: Adjust testcase. From 0d17b8d24fa6079d6c289305e9644c3fecd429f1 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Tue, 6 Jun 2023 03:19:33 -0700 Subject: [PATCH 1/2] OpenACC 2.7: host_data must have use_device clause requirement This patch implements the OpenACC 2.7 change requiring the host_data construct to have at least one use_device clause. gcc/c/ChangeLog: * c-parser.cc (c_parser_oacc_host_data): Add checking requiring OpenACC host_data construct to have an use_device clause. gcc/cp/ChangeLog: * parser.cc (cp_parser_oacc_host_data): Add checking requiring OpenACC host_data construct to have an use_device clause. gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_oacc_construct): Add checking requiring OpenACC host_data construct to have an use_device clause. gcc/testsuite/ChangeLog: * c-c++-common/goacc/host_data-2.c: Adjust testcase. * gfortran.dg/goacc/host_data-error.f90: New testcase. * gfortran.dg/goacc/pr71704.f90: Adjust testcase. --- gcc/c/c-parser.cc | 9 +++-- gcc/cp/parser.cc| 11 +-- gcc/fortran/trans-openmp.cc | 6 ++ gcc/testsuite/c-c++-common/goacc/host_data-2.c | 7 ++- gcc/testsuite/gfortran.dg/goacc/host_data-error.f90 | 6 ++ gcc/testsuite/gfortran.dg/goacc/pr71704.f90 | 5 +++-- 6 files changed, 37 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/host_data-error.f90 diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 5baa501dbee..b61aef8b1a2 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -18398,8 +18398,13 @@ c_parser_oacc_host_data (location_t loc, c_parser *parser, bool *if_p) tree stmt, clauses, block; clauses = c_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK, - "#pragma acc host_data"); - + "#pragma acc host_data", false); + if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR)) +{ + error_at (loc, "% construct requires % clause"); + return error_mark_node; +} + clauses = c_finish_omp_clauses (clauses, C_ORT_ACC); block = c_begin_omp_parallel (); add_stmt (c_parser_omp_structured_block (parser, if_p)); stmt = c_finish_oacc_host_data (loc, clauses, block); diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 1c9aa671851..dd7638f1c93 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -45798,8 +45798,15 @@ cp_parser_oacc_host_data (cp_parser *parser, cp_token *pragma_tok, bool *if_p) unsigned int save; clauses = cp_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK, - "#pragma acc host_data", pragma_tok); - + "#pragma acc host_data", pragma_tok, + false); + if (!omp_find_clause (clauses, OMP_CLAUSE_USE_DEVICE_PTR)) +{ + error_at (pragma_tok->location, + "% construct requires % clause"); + return error_mark_node; +} + clauses = finish_omp_clauses (clauses, C_ORT_ACC); block = begin_omp_parallel (); save = cp_parser_begin_omp_structured_block (parser); cp_parser_statement (parser, NULL_TREE, false, if_p); diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 42b608f3d36..5e0079cce76 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -4677,6 +4677,12 @@ gfc_trans_oacc_construct (gfc_code *code) break; case EXEC_
Re: nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case (was: [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes)
Hi Thomas, On 2023/1/12 9:51 PM, Thomas Schwinge wrote: > In my case, 'cuda_callback_wrapper' (expectedly) gets invoked with > 'res != CUDA_SUCCESS' ("an illegal memory access was encountered"). > When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device > (..., which deadlocks); that's generally problematic: per > https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483 > "'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls". I remember running into this myself when first creating this async support (IIRC in my case it was cuFree()-ing something) yet you've found another mistake here! :) > Given that eventually we must reach a host/device synchronization point > (latest when the device is shut down at program termination), and the > non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to > replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error' as per the > "nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case" > attached. OK to push? I think this patch is fine. Actual approval powers are your's or Tom's :) > > (Might we even skip 'GOMP_PLUGIN_error' here, understanding that the > error will be caught and reported at the next host/device synchronization > point? But I've not verified that.) Actually, the CUDA driver API docs are a bit vague on what exactly this CUresult arg to the callback actually means. The 'res != CUDA_SUCCESS' handling here was basically just generic handling. I am not really sure what is the true right thing to do here (is the error still retained by CUDA after the callback completes?) Chung-Lin
[Ping x6] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Ping x6 On 2022/12/6 12:21 AM, Chung-Lin Tang wrote: > Ping x5 > > On 2022/11/22 12:24 上午, Chung-Lin Tang wrote: >> Ping x4 >> >> On 2022/11/8 12:34 AM, Chung-Lin Tang wrote: >>> Ping x3. >>> >>> On 2022/10/31 10:18 PM, Chung-Lin Tang wrote: >>>> Ping x2. >>>> >>>> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: >>>>> Ping. >>>>> >>>>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >>>>>> Hi Tom, >>>>>> I had a patch submitted earlier, where I reported that the current way >>>>>> of implementing >>>>>> barriers in libgomp on nvptx created a quite significant performance >>>>>> drop on some SPEChpc2021 >>>>>> benchmarks: >>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html >>>>>> That previous patch wasn't accepted well (admittedly, it was kind of a >>>>>> hack). >>>>>> So in this patch, I tried to (mostly) re-implement team-barriers for >>>>>> NVPTX. >>>>>> >>>>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things >>>>>> that it isn't suited for, >>>>>> barriers are implemented simplistically with bar.* synchronization >>>>>> instructions. >>>>>> Tasks are processed after threads have joined, and only if >>>>>> team->task_count != 0 >>>>>> >>>>>> (arguably, there might be a little bit of performance forfeited where >>>>>> earlier arriving threads >>>>>> could've been used to process tasks ahead of other threads. But that >>>>>> again falls into requiring >>>>>> implementing complex futex-wait/wake like behavior. Really, that kind of >>>>>> tasking is not what target >>>>>> offloading is usually used for) >>>>>> >>>>>> Implementation highlight notes: >>>>>> 1. gomp_team_barrier_wake() is now an empty function (threads never >>>>>> "wake" in the usual manner) >>>>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >>>>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >>>>>> >>>>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >>>>>> The main synchronization is done using a 'bar.red' instruction. This >>>>>> reduces across all threads >>>>>> the condition (team->task_count != 0), to enable the task processing >>>>>> down below if any thread >>>>>> created a task. (this bar.red usage required the need of the second >>>>>> GCC patch in this series) >>>>>> >>>>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, >>>>>> using libgomp, ovo, omptests, >>>>>> and sollve_vv testsuites, all without regressions. Also verified that >>>>>> the SPEChpc 2021 521.miniswp_t >>>>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 >>>>>> cycle has been restored to >>>>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >>>>>> >>>>>> (also suggest backporting to GCC12 branch, if performance regression can >>>>>> be considered a defect) >>>>>> >>>>>> Thanks, >>>>>> Chung-Lin >>>>>> >>>>>> libgomp/ChangeLog: >>>>>> >>>>>> 2022-09-21 Chung-Lin Tang >>>>>> >>>>>> * config/nvptx/bar.c (generation_to_barrier): Remove. >>>>>> (futex_wait,futex_wake,do_spin,do_wait): Remove. >>>>>> (GOMP_WAIT_H): Remove. >>>>>> (#include "../linux/bar.c"): Remove. >>>>>> (gomp_barrier_wait_end): New function. >>>>>> (gomp_barrier_wait): Likewise. >>>>>> (gomp_barrier_wait_last): Likewise. >>>>>> (gomp_team_barrier_wait_end): Likewise. >>>>>> (gomp_team_barrier_wait): Likewise. >>>>>> (gomp_team_barrier_wait_final): Likewise. >>>>>> (gomp_team_barrier_wait_cancel_end): Likewise. >>>>>> (gomp_team_barrier_wait_cancel): Likewise. >>>>>> (gomp_team_barrier_cancel): Likewise. >>>>>> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >>>>>> prototype, add new static inline function. >>> >> >
[Ping x5] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Ping x5 On 2022/11/22 12:24 上午, Chung-Lin Tang wrote: > Ping x4 > > On 2022/11/8 12:34 AM, Chung-Lin Tang wrote: >> Ping x3. >> >> On 2022/10/31 10:18 PM, Chung-Lin Tang wrote: >>> Ping x2. >>> >>> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: >>>> Ping. >>>> >>>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >>>>> Hi Tom, >>>>> I had a patch submitted earlier, where I reported that the current way of >>>>> implementing >>>>> barriers in libgomp on nvptx created a quite significant performance drop >>>>> on some SPEChpc2021 >>>>> benchmarks: >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html>>>>>> >>>>> That previous patch wasn't accepted well (admittedly, it was kind of a >>>>> hack). >>>>> So in this patch, I tried to (mostly) re-implement team-barriers for >>>>> NVPTX. >>>>> >>>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things >>>>> that it isn't suited for, >>>>> barriers are implemented simplistically with bar.* synchronization >>>>> instructions. >>>>> Tasks are processed after threads have joined, and only if >>>>> team->task_count != 0 >>>>> >>>>> (arguably, there might be a little bit of performance forfeited where >>>>> earlier arriving threads >>>>> could've been used to process tasks ahead of other threads. But that >>>>> again falls into requiring >>>>> implementing complex futex-wait/wake like behavior. Really, that kind of >>>>> tasking is not what target >>>>> offloading is usually used for) >>>>> >>>>> Implementation highlight notes: >>>>> 1. gomp_team_barrier_wake() is now an empty function (threads never >>>>> "wake" in the usual manner) >>>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >>>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >>>>> >>>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >>>>> The main synchronization is done using a 'bar.red' instruction. This >>>>> reduces across all threads >>>>> the condition (team->task_count != 0), to enable the task processing >>>>> down below if any thread >>>>> created a task. (this bar.red usage required the need of the second >>>>> GCC patch in this series) >>>>> >>>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, >>>>> using libgomp, ovo, omptests, >>>>> and sollve_vv testsuites, all without regressions. Also verified that the >>>>> SPEChpc 2021 521.miniswp_t >>>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 >>>>> cycle has been restored to >>>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >>>>> >>>>> (also suggest backporting to GCC12 branch, if performance regression can >>>>> be considered a defect) >>>>> >>>>> Thanks, >>>>> Chung-Lin >>>>> >>>>> libgomp/ChangeLog: >>>>> >>>>> 2022-09-21 Chung-Lin Tang >>>>> >>>>> * config/nvptx/bar.c (generation_to_barrier): Remove. >>>>> (futex_wait,futex_wake,do_spin,do_wait): Remove. >>>>> (GOMP_WAIT_H): Remove. >>>>> (#include "../linux/bar.c"): Remove. >>>>> (gomp_barrier_wait_end): New function. >>>>> (gomp_barrier_wait): Likewise. >>>>> (gomp_barrier_wait_last): Likewise. >>>>> (gomp_team_barrier_wait_end): Likewise. >>>>> (gomp_team_barrier_wait): Likewise. >>>>> (gomp_team_barrier_wait_final): Likewise. >>>>> (gomp_team_barrier_wait_cancel_end): Likewise. >>>>> (gomp_team_barrier_wait_cancel): Likewise. >>>>> (gomp_team_barrier_cancel): Likewise. >>>>> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >>>>> prototype, add new static inline function. >> >
[Ping x4] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Ping x4 On 2022/11/8 12:34 AM, Chung-Lin Tang wrote: > Ping x3. > > On 2022/10/31 10:18 PM, Chung-Lin Tang wrote: >> Ping x2. >> >> On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: >>> Ping. >>> >>> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >>>> Hi Tom, >>>> I had a patch submitted earlier, where I reported that the current way of >>>> implementing >>>> barriers in libgomp on nvptx created a quite significant performance drop >>>> on some SPEChpc2021 >>>> benchmarks: >>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html>>>>> >>>> That previous patch wasn't accepted well (admittedly, it was kind of a >>>> hack). >>>> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. >>>> >>>> Basically, instead of trying to have the GPU do CPU-with-OS-like things >>>> that it isn't suited for, >>>> barriers are implemented simplistically with bar.* synchronization >>>> instructions. >>>> Tasks are processed after threads have joined, and only if >>>> team->task_count != 0 >>>> >>>> (arguably, there might be a little bit of performance forfeited where >>>> earlier arriving threads >>>> could've been used to process tasks ahead of other threads. But that again >>>> falls into requiring >>>> implementing complex futex-wait/wake like behavior. Really, that kind of >>>> tasking is not what target >>>> offloading is usually used for) >>>> >>>> Implementation highlight notes: >>>> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" >>>> in the usual manner) >>>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >>>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >>>> >>>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >>>> The main synchronization is done using a 'bar.red' instruction. This >>>> reduces across all threads >>>> the condition (team->task_count != 0), to enable the task processing >>>> down below if any thread >>>> created a task. (this bar.red usage required the need of the second >>>> GCC patch in this series) >>>> >>>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, >>>> using libgomp, ovo, omptests, >>>> and sollve_vv testsuites, all without regressions. Also verified that the >>>> SPEChpc 2021 521.miniswp_t >>>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle >>>> has been restored to >>>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >>>> >>>> (also suggest backporting to GCC12 branch, if performance regression can >>>> be considered a defect) >>>> >>>> Thanks, >>>> Chung-Lin >>>> >>>> libgomp/ChangeLog: >>>> >>>> 2022-09-21 Chung-Lin Tang >>>> >>>>* config/nvptx/bar.c (generation_to_barrier): Remove. >>>>(futex_wait,futex_wake,do_spin,do_wait): Remove. >>>>(GOMP_WAIT_H): Remove. >>>>(#include "../linux/bar.c"): Remove. >>>>(gomp_barrier_wait_end): New function. >>>>(gomp_barrier_wait): Likewise. >>>>(gomp_barrier_wait_last): Likewise. >>>>(gomp_team_barrier_wait_end): Likewise. >>>>(gomp_team_barrier_wait): Likewise. >>>>(gomp_team_barrier_wait_final): Likewise. >>>>(gomp_team_barrier_wait_cancel_end): Likewise. >>>>(gomp_team_barrier_wait_cancel): Likewise. >>>>(gomp_team_barrier_cancel): Likewise. >>>>* config/nvptx/bar.h (gomp_team_barrier_wake): Remove >>>>prototype, add new static inline function. >
[Ping x3] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Ping x3. On 2022/10/31 10:18 PM, Chung-Lin Tang wrote: > Ping x2. > > On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: >> Ping. >> >> On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >>> Hi Tom, >>> I had a patch submitted earlier, where I reported that the current way of >>> implementing >>> barriers in libgomp on nvptx created a quite significant performance drop >>> on some SPEChpc2021 >>> benchmarks: >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html >>> >>> That previous patch wasn't accepted well (admittedly, it was kind of a >>> hack). >>> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. >>> >>> Basically, instead of trying to have the GPU do CPU-with-OS-like things >>> that it isn't suited for, >>> barriers are implemented simplistically with bar.* synchronization >>> instructions. >>> Tasks are processed after threads have joined, and only if team->task_count >>> != 0 >>> >>> (arguably, there might be a little bit of performance forfeited where >>> earlier arriving threads >>> could've been used to process tasks ahead of other threads. But that again >>> falls into requiring >>> implementing complex futex-wait/wake like behavior. Really, that kind of >>> tasking is not what target >>> offloading is usually used for) >>> >>> Implementation highlight notes: >>> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" >>> in the usual manner) >>> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >>> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >>> >>> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >>> The main synchronization is done using a 'bar.red' instruction. This >>> reduces across all threads >>> the condition (team->task_count != 0), to enable the task processing >>> down below if any thread >>> created a task. (this bar.red usage required the need of the second GCC >>> patch in this series) >>> >>> This patch has been tested on x86_64/powerpc64le with nvptx offloading, >>> using libgomp, ovo, omptests, >>> and sollve_vv testsuites, all without regressions. Also verified that the >>> SPEChpc 2021 521.miniswp_t >>> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle >>> has been restored to >>> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >>> >>> (also suggest backporting to GCC12 branch, if performance regression can be >>> considered a defect) >>> >>> Thanks, >>> Chung-Lin >>> >>> libgomp/ChangeLog: >>> >>> 2022-09-21 Chung-Lin Tang >>> >>> * config/nvptx/bar.c (generation_to_barrier): Remove. >>> (futex_wait,futex_wake,do_spin,do_wait): Remove. >>> (GOMP_WAIT_H): Remove. >>> (#include "../linux/bar.c"): Remove. >>> (gomp_barrier_wait_end): New function. >>> (gomp_barrier_wait): Likewise. >>> (gomp_barrier_wait_last): Likewise. >>> (gomp_team_barrier_wait_end): Likewise. >>> (gomp_team_barrier_wait): Likewise. >>> (gomp_team_barrier_wait_final): Likewise. >>> (gomp_team_barrier_wait_cancel_end): Likewise. >>> (gomp_team_barrier_wait_cancel): Likewise. >>> (gomp_team_barrier_cancel): Likewise. >>> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >>> prototype, add new static inline function.
[Ping x2] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Ping x2. On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: > Ping. > > On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >> Hi Tom, >> I had a patch submitted earlier, where I reported that the current way of >> implementing >> barriers in libgomp on nvptx created a quite significant performance drop on >> some SPEChpc2021 >> benchmarks: >> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html >> >> That previous patch wasn't accepted well (admittedly, it was kind of a hack). >> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. >> >> Basically, instead of trying to have the GPU do CPU-with-OS-like things that >> it isn't suited for, >> barriers are implemented simplistically with bar.* synchronization >> instructions. >> Tasks are processed after threads have joined, and only if team->task_count >> != 0 >> >> (arguably, there might be a little bit of performance forfeited where >> earlier arriving threads >> could've been used to process tasks ahead of other threads. But that again >> falls into requiring >> implementing complex futex-wait/wake like behavior. Really, that kind of >> tasking is not what target >> offloading is usually used for) >> >> Implementation highlight notes: >> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" >> in the usual manner) >> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >> >> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >> The main synchronization is done using a 'bar.red' instruction. This >> reduces across all threads >> the condition (team->task_count != 0), to enable the task processing >> down below if any thread >> created a task. (this bar.red usage required the need of the second GCC >> patch in this series) >> >> This patch has been tested on x86_64/powerpc64le with nvptx offloading, >> using libgomp, ovo, omptests, >> and sollve_vv testsuites, all without regressions. Also verified that the >> SPEChpc 2021 521.miniswp_t >> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle >> has been restored to >> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >> >> (also suggest backporting to GCC12 branch, if performance regression can be >> considered a defect) >> >> Thanks, >> Chung-Lin >> >> libgomp/ChangeLog: >> >> 2022-09-21 Chung-Lin Tang >> >> * config/nvptx/bar.c (generation_to_barrier): Remove. >> (futex_wait,futex_wake,do_spin,do_wait): Remove. >> (GOMP_WAIT_H): Remove. >> (#include "../linux/bar.c"): Remove. >> (gomp_barrier_wait_end): New function. >> (gomp_barrier_wait): Likewise. >> (gomp_barrier_wait_last): Likewise. >> (gomp_team_barrier_wait_end): Likewise. >> (gomp_team_barrier_wait): Likewise. >> (gomp_team_barrier_wait_final): Likewise. >> (gomp_team_barrier_wait_cancel_end): Likewise. >> (gomp_team_barrier_wait_cancel): Likewise. >> (gomp_team_barrier_cancel): Likewise. >> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >> prototype, add new static inline function.
Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Ping. On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: > Hi Tom, > I had a patch submitted earlier, where I reported that the current way of > implementing > barriers in libgomp on nvptx created a quite significant performance drop on > some SPEChpc2021 > benchmarks: > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html > > That previous patch wasn't accepted well (admittedly, it was kind of a hack). > So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. > > Basically, instead of trying to have the GPU do CPU-with-OS-like things that > it isn't suited for, > barriers are implemented simplistically with bar.* synchronization > instructions. > Tasks are processed after threads have joined, and only if team->task_count > != 0 > > (arguably, there might be a little bit of performance forfeited where earlier > arriving threads > could've been used to process tasks ahead of other threads. But that again > falls into requiring > implementing complex futex-wait/wake like behavior. Really, that kind of > tasking is not what target > offloading is usually used for) > > Implementation highlight notes: > 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in > the usual manner) > 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. > 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" > > 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): > The main synchronization is done using a 'bar.red' instruction. This > reduces across all threads > the condition (team->task_count != 0), to enable the task processing down > below if any thread > created a task. (this bar.red usage required the need of the second GCC > patch in this series) > > This patch has been tested on x86_64/powerpc64le with nvptx offloading, using > libgomp, ovo, omptests, > and sollve_vv testsuites, all without regressions. Also verified that the > SPEChpc 2021 521.miniswp_t > and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle > has been restored to > devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? > > (also suggest backporting to GCC12 branch, if performance regression can be > considered a defect) > > Thanks, > Chung-Lin > > libgomp/ChangeLog: > > 2022-09-21 Chung-Lin Tang > > * config/nvptx/bar.c (generation_to_barrier): Remove. > (futex_wait,futex_wake,do_spin,do_wait): Remove. > (GOMP_WAIT_H): Remove. > (#include "../linux/bar.c"): Remove. > (gomp_barrier_wait_end): New function. > (gomp_barrier_wait): Likewise. > (gomp_barrier_wait_last): Likewise. > (gomp_team_barrier_wait_end): Likewise. > (gomp_team_barrier_wait): Likewise. > (gomp_team_barrier_wait_final): Likewise. > (gomp_team_barrier_wait_cancel_end): Likewise. > (gomp_team_barrier_wait_cancel): Likewise. > (gomp_team_barrier_cancel): Likewise. > * config/nvptx/bar.h (gomp_team_barrier_wake): Remove > prototype, add new static inline function.
Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
On 2022/9/21 5:01 PM, Jakub Jelinek wrote: On Wed, Sep 21, 2022 at 03:45:36PM +0800, Chung-Lin Tang via Gcc-patches wrote: Hi Tom, I had a patch submitted earlier, where I reported that the current way of implementing barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021 benchmarks: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html That previous patch wasn't accepted well (admittedly, it was kind of a hack). So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for, barriers are implemented simplistically with bar.* synchronization instructions. Tasks are processed after threads have joined, and only if team->task_count != 0 (arguably, there might be a little bit of performance forfeited where earlier arriving threads could've been used to process tasks ahead of other threads. But that again falls into requiring implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target offloading is usually used for) I admit I don't have a good picture if people in real-world actually use tasking in offloading regions and how much and in what way, but the above definitely would be a show-stopper for typical tasking workloads, where one thread (usually from master/masked/single construct's body) creates lots of tasks and can spend considerable amount of time in those preparations, while other threads are expected to handle those tasks. I think the most common use case for target offloading is "parallel for". Really, not simply removing tasking altogether from target regions in the specification is just looking for trouble. If asynchronous offloaded tasks are to be supported, something at the whole GPU offload region level is much more reasonable, like the async clause functionality in OpenACC. Do we have an idea how are other implementations handling this? I think it should be easily observable with atomics, have master/masked/single that creates lots of tasks and then spends a long time doing something, have very small task bodies that just increment some atomic counter and at the end of the master/masked/single see how many tasks were already encountered. This could be an interesting test... Note, I don't have any smart ideas how to handle this instead and what you posted might be ok for what people usually do on offloading targets in OpenMP if they use tasking at all, just wanted to mention that there could be workloads where the above is a serious problem. If there are say hundreds of threads doing nothing until a single thread reaches a barrier and there are hundreds of pending tasks... I think it might still be doable, just not in the very fine "wake one thread" style that the Linux-based implementation was doing. E.g. note we have that 64 pending task limit after which we start to create undeferred tasks, so if we never start handling tasks until one thread is done with them, that would mean the single thread would create 64 deferred tasks and then handle all the others itself making it even longer until the other tasks can deal with it. Okay, thanks for reminding that. Chung-Lin
[PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction support in GCC
Hi Tom, following the first patch. This new barrier implementation I posted in the first patch uses the 'bar.red' instruction. Usually this could've been easily done with a single line of inline assembly. However I quickly realized that because the NVPTX GCC port is implemented with all virtual general registers, we don't have a register constraint usable to select "predicate registers". Since bar.red uses predicate typed values, I can't create it directly using inline asm. So it appears that the most simple way of accessing it is with a target builtin. The attached patch adds bar.red instructions to the nvptx port, and __builtin_nvptx_bar_red_* builtins to use it. The code should support all variations of bar.red (and, or, and popc operations). (This support was used to implement the first libgomp barrier patch, so must be approved together) Thanks, Chung-Lin 2022-09-21 Chung-Lin Tang gcc/ChangeLog: * config/nvptx/nvptx.cc (nvptx_print_operand): Add 'p' case, adjust comments. (enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_RED_AND, NVPTX_BUILTIN_BAR_RED_OR, and NVPTX_BUILTIN_BAR_RED_POPC. (nvptx_expand_bar_red): New function. (nvptx_init_builtins): Add DEFs of __builtin_nvptx_bar_red_[and/or/popc]. (nvptx_expand_builtin): Use nvptx_expand_bar_red to expand NVPTX_BUILTIN_BAR_RED_[AND/OR/POPC] cases. * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_BARRED_AND, UNSPECV_BARRED_OR, and UNSPECV_BARRED_POPC. (BARRED): New int iterator. (barred_op,barred_mode,barred_ptxtype): New int attrs. (nvptx_barred_): New define_insn. diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 49cc681..afc3a890 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -2879,6 +2879,7 @@ nvptx_mem_maybe_shared_p (const_rtx x) t -- print a type opcode suffix, promoting QImode to 32 bits T -- print a type size in bits u -- print a type opcode suffix without promotions. + p -- print a '!' for constant 0. x -- print a destination operand that may also be a bit bucket. */ static void @@ -3012,6 +3013,11 @@ nvptx_print_operand (FILE *file, rtx x, int code) fprintf (file, "@!"); goto common; +case 'p': + if (INTVAL (x) == 0) + fprintf (file, "!"); + break; + case 'c': mode = GET_MODE (XEXP (x, 0)); switch (x_code) @@ -6151,9 +6157,90 @@ enum nvptx_builtins NVPTX_BUILTIN_CMP_SWAPLL, NVPTX_BUILTIN_MEMBAR_GL, NVPTX_BUILTIN_MEMBAR_CTA, + NVPTX_BUILTIN_BAR_RED_AND, + NVPTX_BUILTIN_BAR_RED_OR, + NVPTX_BUILTIN_BAR_RED_POPC, NVPTX_BUILTIN_MAX }; +/* Expander for 'bar.red' instruction builtins. */ + +static rtx +nvptx_expand_bar_red (tree exp, rtx target, + machine_mode ARG_UNUSED (m), int ARG_UNUSED (ignore)) +{ + int code = DECL_MD_FUNCTION_CODE (TREE_OPERAND (CALL_EXPR_FN (exp), 0)); + machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); + + if (!target) +target = gen_reg_rtx (mode); + + rtx pred, dst; + rtx bar = expand_expr (CALL_EXPR_ARG (exp, 0), +NULL_RTX, SImode, EXPAND_NORMAL); + rtx nthr = expand_expr (CALL_EXPR_ARG (exp, 1), + NULL_RTX, SImode, EXPAND_NORMAL); + rtx cpl = expand_expr (CALL_EXPR_ARG (exp, 2), +NULL_RTX, SImode, EXPAND_NORMAL); + rtx redop = expand_expr (CALL_EXPR_ARG (exp, 3), + NULL_RTX, SImode, EXPAND_NORMAL); + if (CONST_INT_P (bar)) +{ + if (INTVAL (bar) < 0 || INTVAL (bar) > 15) + { + error_at (EXPR_LOCATION (exp), + "barrier value must be within [0,15]"); + return const0_rtx; + } +} + else if (!REG_P (bar)) +bar = copy_to_mode_reg (SImode, bar); + + if (!CONST_INT_P (nthr) && !REG_P (nthr)) +nthr = copy_to_mode_reg (SImode, nthr); + + if (!CONST_INT_P (cpl)) +{ + error_at (EXPR_LOCATION (exp), + "complement argument must be constant"); + return const0_rtx; +} + + pred = gen_reg_rtx (BImode); + if (!REG_P (redop)) +redop = copy_to_mode_reg (SImode, redop); + emit_insn (gen_rtx_SET (pred, gen_rtx_NE (BImode, redop, GEN_INT (0; + redop = pred; + + rtx pat; + switch (code) +{ +case NVPTX_BUILTIN_BAR_RED_AND: + dst = gen_reg_rtx (BImode); + pat = gen_nvptx_barred_and (dst, bar, nthr, cpl, redop); + break; +case NVPTX_BUILTIN_BAR_RED_OR: + dst = gen_reg_rtx (BImode); + pat = gen_nvptx_barred_or (dst, bar, nthr, cpl, redop); + break; +case NVPTX_BUILTIN_BAR_RED_POPC: + dst = gen_reg_rtx (SImode); + pat = gen_nvptx_barred_popc (dst, bar, nthr, cpl, redop); + break; +default: + gcc_unreachable (); +} + emit_insn (pat); + if (GET_MODE (dst) == BImode) +{ + rt
[PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
Hi Tom, I had a patch submitted earlier, where I reported that the current way of implementing barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021 benchmarks: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html That previous patch wasn't accepted well (admittedly, it was kind of a hack). So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for, barriers are implemented simplistically with bar.* synchronization instructions. Tasks are processed after threads have joined, and only if team->task_count != 0 (arguably, there might be a little bit of performance forfeited where earlier arriving threads could've been used to process tasks ahead of other threads. But that again falls into requiring implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target offloading is usually used for) Implementation highlight notes: 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in the usual manner) 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): The main synchronization is done using a 'bar.red' instruction. This reduces across all threads the condition (team->task_count != 0), to enable the task processing down below if any thread created a task. (this bar.red usage required the need of the second GCC patch in this series) This patch has been tested on x86_64/powerpc64le with nvptx offloading, using libgomp, ovo, omptests, and sollve_vv testsuites, all without regressions. Also verified that the SPEChpc 2021 521.miniswp_t and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has been restored to devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? (also suggest backporting to GCC12 branch, if performance regression can be considered a defect) Thanks, Chung-Lin libgomp/ChangeLog: 2022-09-21 Chung-Lin Tang * config/nvptx/bar.c (generation_to_barrier): Remove. (futex_wait,futex_wake,do_spin,do_wait): Remove. (GOMP_WAIT_H): Remove. (#include "../linux/bar.c"): Remove. (gomp_barrier_wait_end): New function. (gomp_barrier_wait): Likewise. (gomp_barrier_wait_last): Likewise. (gomp_team_barrier_wait_end): Likewise. (gomp_team_barrier_wait): Likewise. (gomp_team_barrier_wait_final): Likewise. (gomp_team_barrier_wait_cancel_end): Likewise. (gomp_team_barrier_wait_cancel): Likewise. (gomp_team_barrier_cancel): Likewise. * config/nvptx/bar.h (gomp_team_barrier_wake): Remove prototype, add new static inline function. diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c index eee2107..0b958ed 100644 --- a/libgomp/config/nvptx/bar.c +++ b/libgomp/config/nvptx/bar.c @@ -30,137 +30,143 @@ #include #include "libgomp.h" -/* For cpu_relax. */ -#include "doacross.h" - -/* Assuming ADDR is >generation, return bar. Copied from - rtems/bar.c. */ +void +gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) +{ + if (__builtin_expect (state & BAR_WAS_LAST, 0)) +{ + /* Next time we'll be awaiting TOTAL threads again. */ + bar->awaited = bar->total; + __atomic_store_n (>generation, bar->generation + BAR_INCR, + MEMMODEL_RELEASE); +} + if (bar->total > 1) +asm ("bar.sync 1, %0;" : : "r" (32 * bar->total)); +} -static gomp_barrier_t * -generation_to_barrier (int *addr) +void +gomp_barrier_wait (gomp_barrier_t *bar) { - char *bar -= (char *) addr - __builtin_offsetof (gomp_barrier_t, generation); - return (gomp_barrier_t *)bar; + gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar)); } -/* Implement futex_wait-like behaviour to plug into the linux/bar.c - implementation. Assumes ADDR is >generation. */ +/* Like gomp_barrier_wait, except that if the encountering thread + is not the last one to hit the barrier, it returns immediately. + The intended usage is that a thread which intends to gomp_barrier_destroy + this barrier calls gomp_barrier_wait, while all other threads + call gomp_barrier_wait_last. When gomp_barrier_wait returns, + the barrier can be safely destroyed. */ -static inline void -futex_wait (int *addr, int val) +void +gomp_barrier_wait_last (gomp_barrier_t *bar) { - gomp_barrier_t *bar = generation_to_barrier (addr); + /* The above described behavior matches 'bar.arrive' perfectly. */ + if (bar->total > 1) +asm ("bar.arrive 1, %0;" : : "r" (32 * bar->total)); +}
[PING x2] Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
On 2022/8/26 4:15 PM, Chung-Lin Tang wrote: > On 2022/8/4 9:31 PM, Koning, Paul wrote: >> >> >>> On Aug 4, 2022, at 9:17 AM, Chung-Lin Tang wrote: >>> >>> On 2022/6/28 10:06 PM, Jakub Jelinek wrote: >>>> On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote: >>>>> with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next: >>>>> >>>>> (1) chunk_size <= -1: wraps into large unsigned value, seems to work >>>>> though. >>>>> (2) chunk_size == 0: infinite loop >>>>> >>>>> The (2) behavior is obviously not desired. This patch fixes this by >>>>> changing >>>> Why? It is a user error, undefined behavior, we shouldn't slow down valid >>>> code for users who don't bother reading the standard. >>> >>> This is loop init code, not per-iteration. The overhead really isn't that >>> much. >>> >>> The question should be, if GCC having infinite loop behavior is reasonable, >>> even if it is undefined in the spec. >> >> I wouldn't think so. The way I see "undefined code" is that you can't >> complain about "wrong code" produced by the compiler. But for the compiler >> to malfunction on wrong input is an entirely differerent matter. For one >> thing, it's hard to fix your code if the compiler fails. How would you >> locate the offending source line? >> >> paul > > Ping? Ping x2.
[PATCH] optc-save-gen.awk: adjust generated array compare
Hi Joseph, Jan-Benedict reported a build-bot error for the nios2 port under --enable-werror-always: options-save.cc: In function 'bool cl_target_option_eq(const cl_target_option*, const cl_target_option*)': options-save.cc:9291:38: error: comparison between two arrays [-Werror=array-compare] 9291 | if (ptr1->saved_custom_code_status != ptr2->saved_custom_code_status | ~~~^ options-save.cc:9291:38: note: use unary '+' which decays operands to pointers or '&'component_ref' not supported by dump_decl[0] != &'component_ref' not supported by dump_decl[0]' to compare the addresses options-save.cc:9294:37: error: comparison between two arrays [-Werror=array-compare] 9294 | if (ptr1->saved_custom_code_index != ptr2->saved_custom_code_index | ~~^~~~ ... This is due to an array-typed TargetSave state in config/nios2/nios2.opt: ... TargetSave enum nios2_ccs_code saved_custom_code_status[256] TargetSave int saved_custom_code_index[256] ... This patch adjusts the generated array state compare from 'ptr1->array' into '>array[0]' in gcc/optc-save-gen.awk, seems sufficient to pass the tougher checks. Tested by ensuring the compiler builds, which should be sufficient here. Okay to commit to mainline? Thanks, Chung-Lin * optc-save-gen.awk: Adjust array compare to use '>name[0]' instead of 'ptr->name'. diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk index 233d1fbb637..27aabf2955e 100644 --- a/gcc/optc-save-gen.awk +++ b/gcc/optc-save-gen.awk @@ -1093,7 +1093,7 @@ for (i = 0; i < n_target_array; i++) { name = var_target_array[i] size = var_target_array_size[i] type = var_target_array_type[i] - print " if (ptr1->" name" != ptr2->" name ""; + print " if (>" name"[0] != >" name "[0]"; print " || memcmp (ptr1->" name ", ptr2->" name ", " size " * sizeof(" type ")))" print "return false;"; }
[PATCH, nios2, committed] Add #undef of MUSL_DYNAMIC_LINKER
This patch adds an #undef of MUSL_DYNAMIC_LINKER before its #define in config/nios2/linux.h. This makes the nios2-linux build pass when the compiler is configured with --enable-werror-always. Patch pushed to master at 0697bd070c4fffb33468976c93baff9493922fb3 Chung-LinFrom 0697bd070c4fffb33468976c93baff9493922fb3 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 8 Sep 2022 23:14:38 +0800 Subject: [PATCH] nios2: Add #undef of MUSL_DYNAMIC_LINKER Add #undef of MUSL_DYNAMIC_LINKER before #define, to satisfy build checks when configured with --enable-werror-always. gcc/ChangeLog: * config/nios2/linux.h (MUSL_DYNAMIC_LINKER): Add #undef before #define. --- gcc/config/nios2/linux.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/config/nios2/linux.h b/gcc/config/nios2/linux.h index f5dd813acad..9e53dd657e4 100644 --- a/gcc/config/nios2/linux.h +++ b/gcc/config/nios2/linux.h @@ -30,6 +30,8 @@ #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{pthread:-D_REENTRANT}" #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-nios2.so.1" + +#undef MUSL_DYNAMIC_LINKER #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-nios2.so.1" #undef LINK_SPEC -- 2.17.1
Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran
On 2022/8/15 7:15 PM, Chung-Lin Tang wrote: On 2022/8/15 7:06 PM, Chung-Lin Tang wrote: I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran interacts, but it's finally working. Again tested without regressions. Preparing to commit to devel/omp/gcc-12, and seeking approval for mainline when the requires patches are in. Just realized that I don't have the new testcases added in this patch. Will supplement them later :P Here's the USM allocator/libgfortran patch, with a libgomp.fortran testcase added. Thanks, Chung-Lin 2022-09-05 Chung-Lin Tang libgcc/ * Makefile.in (crtoffloadend$(objext)): Add $(PICFLAG) to compile rule. * offloadstuff.c (GOMP_offload_register_ver): Add declaration of weak symbol. (__OFFLOAD_TABLE__): Likewise. (init_non_offload): New function. libgfortran/ * gfortran.map (GFORTRAN_13): New namespace. (_gfortran_mem_allocators_init): New name inside GFORTRAN_13. * libgfortran.h (mem_allocators_init): New exported declaration. * runtime/main.c (do_init): Rename from init, add run-once guard code. (cleanup): Add run-once guard code. (GOMP_post_offload_register_callback): Declare weak symbol. (GOMP_pre_gomp_target_fini_callback): Likewise. (init): New constructor to register offload callbacks, or call do_init when not OpenMP. * runtime/memory.c (gfortran_malloc): New pointer variable. (gfortran_calloc): Likewise. (gfortran_realloc): Likewise. (gfortran_free): Likewise. (mem_allocators_init): New function. (xmalloc): Use gfortran_malloc. (xmallocarray): Use gfortran_malloc. (xcalloc): Use gfortran_calloc. (xrealloc): Use gfortran_realloc. (xfree): Use gfortran_free. libgomp/ * libgomp.map (GOMP_5.1.2): New version namespace. (GOMP_post_offload_register_callback): New name inside GOMP_5.1.2. (GOMP_pre_gomp_target_fini_callback): Likewise. (GOMP_DEFINE_CALLBACK_SET): Macro to define callback set. (post_offload_register): Define callback set for after offload image register. (pre_gomp_target_fini): Define callback set for before gomp_target_fini is called. (libgfortran_malloc_usm): New function. (libgfortran_calloc_usm): Likewise (libgfortran_realloc_usm): Likewise (libgfortran_free_usm): Likewise. (_gfortran_mem_allocators_init): Declare weak symbol. (gomp_libgfortran_omp_allocators_init): New function. (GOMP_offload_register_ver): Add handling of host_table == NULL, calling into libgfortran to set unified_shared_memory allocators, and execution of post_offload_register callbacks. (gomp_target_init): Register all pre_gomp_target_fini callbacks to run at end of main using atexit(). * testsuite/libgomp.fortran/target-unified_shared_memory-1.f90: New test. diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in index 09b3ec8bc2e..70720cc910c 100644 --- a/libgcc/Makefile.in +++ b/libgcc/Makefile.in @@ -1045,8 +1045,9 @@ crtbeginT$(objext): $(srcdir)/crtstuff.c crtoffloadbegin$(objext): $(srcdir)/offloadstuff.c $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN +# crtoffloadend contains a constructor with calls to libgomp, so build as PIC. crtoffloadend$(objext): $(srcdir)/offloadstuff.c - $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END + $(crt_compile) $(CRTSTUFF_T_CFLAGS) $(PICFLAG) -c $< -DCRT_END crtoffloadtable$(objext): $(srcdir)/offloadstuff.c $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_TABLE diff --git a/libgcc/offloadstuff.c b/libgcc/offloadstuff.c index 10e1fe19c8e..2edb6810021 100644 --- a/libgcc/offloadstuff.c +++ b/libgcc/offloadstuff.c @@ -63,6 +63,19 @@ const void *const __offload_vars_end[0] __attribute__ ((__used__, visibility ("hidden"), section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { }; +extern void GOMP_offload_register_ver (unsigned, const void *, int, + const void *); +extern const void *const __OFFLOAD_TABLE__[0] __attribute__ ((weak)); +static void __attribute__((constructor)) +init_non_offload (void) +{ + /* If an OpenMP program has no offloading, post-offload_register callbacks + that need to run will require a call to GOMP_offload_register_ver, in + order to properly trigger those callbacks during init. */ + if (__OFFLOAD_TABLE__ == NULL) +GOMP_offload_register_ver (0, NULL, 0, NULL); +} + #elif defined CRT_TABLE extern const void *const __offload_func_table[]; diff --git a/libgfortran/gfortran.map b/libgfortran/gfortran.map index e0e795c3d48..55d2a529acd 100644 --- a/libgfortran/gfortran.map +++ b/libgfortran/gfortran.map @@ -1759,3 +1759,8 @@ GFORTRAN_12 { _gfortran_transfer_real128_write; #endif } GFORTRAN_10.2; + +GFORTRAN_13 { + global: + _gfortran_mem_allocators_init; +} GFORTRAN_12; diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h index 0b893a51851..e518b3989cf 10
[OpenMP, nvptx] Use bar.sync/arrive for barriers when tasking is not used
Hi, our work on SPEChpc2021 benchmarks show that, after the fix for PR99555 was committed: [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=5ed77fb3ed1ee0289a0ec9499ef52b99b39421f1 while that patch fixed the hang, there were quite severe performance regressions caused by this new barrier code. Under OpenMP target offload mode, Minisweep regressed by about 350%, while HPGMG-FV was about 2x slower. So the problem was presumably the new barriers, which replaced erroneous but fast bar.sync instructions, with correct but really heavy-weight futex_wait/wake operations on the GPU. This is probably required for preserving correct task vs. barrier behavior. However, the observation is that: when tasks-related functionality are not used at all by the team inside an OpenMP target region, and a barrier is just a place to wait for all threads to rejoin (no problem of invoking waiting tasks to re-start) a barrier can in that case be implemented by simple bar.sync and bar.arrive PTX instructions. That should be able to recover most performance the cases that usually matter, e.g. 'omp parallel for' inside 'omp target'. So the plan is to mark cases where 'tasks are never used'. This patch adds a 'task_never_used' flag inside struct gomp_team, initialized to true, and set to false when tasks are added to the team. The nvptx specific gomp_team_barrier_wait_end routines can then use simple barrier when team->task_never_used remains true on the barrier. Some other cases, like the master/masked construct, and single construct, also needs to have task_never_used set false; because these constructs inherently creates asymmetric loads where only a subset of threads run through the region (which may or may not use tasking), there may be the case where different threads wait at the end assuming different task_never_used cases. For correctness, these constructs must have team->task_never_used conservatively marked false at the start of the construct. This patch has been divided into two: the first is the inlining of contents of config/linux/bar.c into config/nvptx/bar.c (instead of an include). This is needed now because some parts of gomp_team_barrier_wait_[cancel_]end now needs nvptx specific adjustments. The second contains the above described changes. Tested on powerpc64le-linux and x86_64-linux with nvptx offloading, seeking approval for trunk. Thanks, Chung-Lin From c2fdc31880d2d040822e8abece015c29a6d7b472 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 1 Sep 2022 05:53:49 -0700 Subject: [PATCH 1/2] libgomp: inline config/linux/bar.c into config/nvptx/bar.c Preparing to add nvptx specific modifications to gomp_team_barrier_wait_end, et al., so change from using an #include of config/linux/bar.c in config/nvptx/bar.c, to a full copy of the implementation. 2022-09-01 Chung-Lin Tang libgomp/ChangeLog: * config/nvptx/bar.c: Adjust include of "../linux/bar.c" into an inlining of contents of config/linux/bar.c, --- libgomp/config/nvptx/bar.c | 183 - 1 file changed, 180 insertions(+), 3 deletions(-) diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c index eee2107..a850c22 100644 --- a/libgomp/config/nvptx/bar.c +++ b/libgomp/config/nvptx/bar.c @@ -161,6 +161,183 @@ static inline void do_wait (int *addr, int val) futex_wait (addr, val); } -/* Reuse the linux implementation. */ -#define GOMP_WAIT_H 1 -#include "../linux/bar.c" +/* Below is based on the linux implementation. */ + +void +gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) +{ + if (__builtin_expect (state & BAR_WAS_LAST, 0)) +{ + /* Next time we'll be awaiting TOTAL threads again. */ + bar->awaited = bar->total; + __atomic_store_n (>generation, bar->generation + BAR_INCR, + MEMMODEL_RELEASE); + futex_wake ((int *) >generation, INT_MAX); +} + else +{ + do + do_wait ((int *) >generation, state); + while (__atomic_load_n (>generation, MEMMODEL_ACQUIRE) == state); +} +} + +void +gomp_barrier_wait (gomp_barrier_t *bar) +{ + gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar)); +} + +/* Like gomp_barrier_wait, except that if the encountering thread + is not the last one to hit the barrier, it returns immediately. + The intended usage is that a thread which intends to gomp_barrier_destroy + this barrier calls gomp_barrier_wait, while all other threads + call gomp_barrier_wait_last. When gomp_barrier_wait returns, + the barrier can be safely destroyed. */ + +void +gomp_barrier_wait_last (gomp_barrier_t *bar) +{ + gomp_barrier_state_t state = gomp_barrier_wait_start (bar); + if (state & BAR_WAS_LAST) +gomp_barrier_wait_end (bar, state); +} + +void +gomp_team_barrier_wake (gomp_barrier_t *bar, int count) +{ + futex_
[PING] Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
On 2022/8/4 9:31 PM, Koning, Paul wrote: On Aug 4, 2022, at 9:17 AM, Chung-Lin Tang wrote: On 2022/6/28 10:06 PM, Jakub Jelinek wrote: On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote: with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next: (1) chunk_size <= -1: wraps into large unsigned value, seems to work though. (2) chunk_size == 0: infinite loop The (2) behavior is obviously not desired. This patch fixes this by changing Why? It is a user error, undefined behavior, we shouldn't slow down valid code for users who don't bother reading the standard. This is loop init code, not per-iteration. The overhead really isn't that much. The question should be, if GCC having infinite loop behavior is reasonable, even if it is undefined in the spec. I wouldn't think so. The way I see "undefined code" is that you can't complain about "wrong code" produced by the compiler. But for the compiler to malfunction on wrong input is an entirely differerent matter. For one thing, it's hard to fix your code if the compiler fails. How would you locate the offending source line? paul Ping?
Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran
On 2022/8/15 7:06 PM, Chung-Lin Tang wrote: I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran interacts, but it's finally working. Again tested without regressions. Preparing to commit to devel/omp/gcc-12, and seeking approval for mainline when the requires patches are in. Just realized that I don't have the new testcases added in this patch. Will supplement them later :P Thanks, Chung-Lin
[PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran
After the first libgfortran memory allocator preparation patch, this is the actual patch that organizes unified_shared_memory allocation into libgfortran. In the current OpenMP requires implementation, the requires_mask is collected through offload LTO processing, and presented to libgomp when registering offload images through GOMP_offload_register_ver() (called by the mkoffload generated constructor linked into the program binary) This means that the only reliable place to access omp_requires_mask is in GOMP_offload_register_ver, however since it is called through an ELF constructor in the *main program*, this runs later than libgfortran/runtime/main.c:init() constructor, and because some libgfortran init actions there start allocating memory, this can cause more deallocation errors later. Another issue is that CUDA appears to be registering some cleanup actions using atexit(), which forces libgomp to register gomp_target_fini() using atexit as well (to properly run before the underlying CUDA stuff disappears). This happens to us here as well. So to summarize we need to: (1) order libgfortran init actions after omp_requires_mask processing is done, and (2) order libgfortran cleanup actions before gomp_target_fini, to properly deallocate stuff without crashing. The above explanation is for why there's a little new set of definitions, as well as callback registering functions exported from libgomp to libgfortran, basically to register libgfortran init/fini actions into libgomp to run. Inside GOMP_offload_register_ver, after omp_requires_mask processing is done, we call into libgfortran through a new _gfortran_mem_allocators_init function to insert the omp_free/alloc/etc. based allocators into the Fortran runtime, when GOMP_REQUIRES_UNIFIED_SHARED_MEMORY is set. All symbol references between libgfortran/libgomp are defined with weak symbols. Test of the weak symbols are also used to determine if the other library exists in this program. A final issue is: the case where we have an OpenMP program that does NOT have offloading. We cannot passively determine in libgomp/libgfortran whether offloading exists or not, only the main program itself can, by seeing if the hidden __OFFLOAD_TABLE__ exists. When we do init/fini libgomp callback registering for OpenMP programs, those with no offloading will not have those callback properly run (because of no offload image loading) Therefore the solution here is a constructor added into the crtoffloadend.o fragment that does a "null" call of GOMP_offload_register_ver, solely for triggering the post-offload_register callbacks when __OFFLOAD_TABLE__ is NULL. (and because of this, the crtoffloadend.o Makefile rule is adjusted to compile with PIC) I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran interacts, but it's finally working. Again tested without regressions. Preparing to commit to devel/omp/gcc-12, and seeking approval for mainline when the requires patches are in. Thanks, Chung-Lin 2022-08-15 Chung-Lin Tang libgcc/ * Makefile.in (crtoffloadend$(objext)): Add $(PICFLAG) to compile rule. * offloadstuff.c (GOMP_offload_register_ver): Add declaration of weak symbol. (__OFFLOAD_TABLE__): Likewise. (init_non_offload): New function. libgfortran/ * gfortran.map (GFORTRAN_13): New namespace. (_gfortran_mem_allocators_init): New name inside GFORTRAN_13. * libgfortran.h (mem_allocators_init): New exported declaration. * runtime/main.c (do_init): Rename from init, add run-once guard code. (cleanup): Add run-once guard code. (GOMP_post_offload_register_callback): Declare weak symbol. (GOMP_pre_gomp_target_fini_callback): Likewise. (init): New constructor to register offload callbacks, or call do_init when not OpenMP. * runtime/memory.c (gfortran_malloc): New pointer variable. (gfortran_calloc): Likewise. (gfortran_realloc): Likewise. (gfortran_free): Likewise. (mem_allocators_init): New function. (xmalloc): Use gfortran_malloc. (xmallocarray): Use gfortran_malloc. (xcalloc): Use gfortran_calloc. (xrealloc): Use gfortran_realloc. (xfree): Use gfortran_free. libgomp/ * libgomp.map (GOMP_5.1.2): New version namespace. (GOMP_post_offload_register_callback): New name inside GOMP_5.1.2. (GOMP_pre_gomp_target_fini_callback): Likewise. (GOMP_DEFINE_CALLBACK_SET): Macro to define callback set. (post_offload_register): Define callback set for after offload image register. (pre_gomp_target_fini): Define callback set for before gomp_target_fini is called. (libgfortran_malloc_usm): New function. (libgfortran_calloc_usm): Likewise (libgfortran_realloc_usm): Likewise (libgfortran_free_usm): Likewise. (_gfortran_mem_alloc
[PATCH, OpenMP, Fortran] requires unified_shared_memory 1/2: adjust libgfortran memory allocators
Hi, this patch is to fix the case where 'requires unified_shared_memory' doesn't work due to memory allocator mismatch. Currently this is only for OG12 (devel/omp/gcc-12), but will apply to mainline as well once those requires patches get in. Basically, under 'requires unified_shared_memory' enables the usm_transform pass, which transforms some of the expanded Fortran intrinsic code that uses __builtin_free() into 'omp_free (..., ompx_unified_shared_mem_alloc)'. The intention is to make all dynamic memory allocation use the OpenMP unified_shared_memory allocator, but there is a big gap in this, namely libgfortran. What happens in some tests are that libgfortran allocates stuff using normal malloc(), and the usm_transform generates code that frees the stuff using omp_free(), and chaos ensues. So the proper fix we believe is: to make it possible to move the entire libgfortran on to unified_shared_memory. This first patch is a mostly mechanical patch to change all references of malloc/free/calloc/realloc in libgfortran into xmalloc/xfree/xcalloc/xrealloc in libgfortran/runtime/memory.c, as well as strdup uses into a new internal xstrdup. All of libgfortran is adjusted this way, except libgfortran/caf, which is an independent library outside of libgfortran.so. The second patch of this series will present a way to switch the references of allocators in libgfortran/runtime/memory.c from the normal glibc malloc/free/etc. to omp_alloc/omp_free/etc. when 'requires unified_shared_memory' is detected. Tested on devel/omp/gcc-12. Plans is to commit there soon, but also seeking approval for mainline once the requires stuff goes in. Thanks, Chung-Lin 2022-08-15 Chung-Lin Tang libgfortran/ChangeLog: * m4/matmul_internal.m4: Adjust malloc/free to xmalloc/xfree. * generated/matmul_c10.c: Regenerate. * generated/matmul_c16.c: Likewise. * generated/matmul_c17.c: Likewise. * generated/matmul_c4.c: Likewise. * generated/matmul_c8.c: Likewise. * generated/matmul_i1.c: Likewise. * generated/matmul_i16.c: Likewise. * generated/matmul_i2.c: Likewise. * generated/matmul_i4.c: Likewise. * generated/matmul_i8.c: Likewise. * generated/matmul_r10.c: Likewise. * generated/matmul_r16.c: Likewise. * generated/matmul_r17.c: Likewise. * generated/matmul_r4.c: Likewise. * generated/matmul_r8.c: Likewise. * generated/matmulavx128_c10.c: Likewise. * generated/matmulavx128_c16.c: Likewise. * generated/matmulavx128_c17.c: Likewise. * generated/matmulavx128_c4.c: Likewise. * generated/matmulavx128_c8.c: Likewise. * generated/matmulavx128_i1.c: Likewise. * generated/matmulavx128_i16.c: Likewise. * generated/matmulavx128_i2.c: Likewise. * generated/matmulavx128_i4.c: Likewise. * generated/matmulavx128_i8.c: Likewise. * generated/matmulavx128_r10.c: Likewise. * generated/matmulavx128_r16.c: Likewise. * generated/matmulavx128_r17.c: Likewise. * generated/matmulavx128_r4.c: Likewise. * generated/matmulavx128_r8.c: Likewise. * intrinsics/access.c (access_func): Adjust free to xfree. * intrinsics/chdir.c (chdir_i4_sub): Likewise. (chdir_i8_sub): Likewise. * intrinsics/chmod.c (chmod_func): Likewise. * intrinsics/date_and_time.c (secnds): Likewise. * intrinsics/env.c (PREFIX(getenv)): Likewise. (get_environment_variable_i4): Likewise. * intrinsics/execute_command_line.c (execute_command_line): Likewise. * intrinsics/getcwd.c (getcwd_i4_sub): Likewise. * intrinsics/getlog.c (PREFIX(getlog)): Likewise. * intrinsics/link.c (link_internal): Likewise. * intrinsics/move_alloc.c (move_alloc): Likewise. * intrinsics/perror.c (perror_sub): Likewise. * intrinsics/random.c (constructor_random): Likewise. * intrinsics/rename.c (rename_internal): Likewise. * intrinsics/stat.c (stat_i4_sub_0): Likewise. (stat_i8_sub_0): Likewise. * intrinsics/symlnk.c (symlnk_internal): Likewise. * intrinsics/system.c (system_sub): Likewise. * intrinsics/unlink.c (unlink_i4_sub): Likewise. * io/async.c (update_pdt): Likewise. (async_io): Likewise. (free_async_unit): Likewise. (init_async_unit): Adjust calloc to xcalloc. (enqueue_done_id): Likewise. (enqueue_done): Likewise. (enqueue_close): Likewise. * io/async.h (MUTEX_DEBUG_ADD): Adjust malloc/free to xmalloc/xfree. * io/close.c (st_close): Adjust strdup/free to xstrdup/xfree. * io/fbuf.c (fbuf_destroy): Adjust free to xfree. * io/format.c (free_format_hash_table): Likewise. (save_parsed_format): Likewise. (free_format): Likewise. (free_format_data): Likewise. * io/intrinsics.c (ttynam
Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
On 2022/6/28 10:06 PM, Jakub Jelinek wrote: On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote: with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next: (1) chunk_size <= -1: wraps into large unsigned value, seems to work though. (2) chunk_size == 0: infinite loop The (2) behavior is obviously not desired. This patch fixes this by changing Why? It is a user error, undefined behavior, we shouldn't slow down valid code for users who don't bother reading the standard. This is loop init code, not per-iteration. The overhead really isn't that much. The question should be, if GCC having infinite loop behavior is reasonable, even if it is undefined in the spec. E.g. OpenMP 5.1 [132:14] says clearly: "chunk_size must be a loop invariant integer expression with a positive value." and omp_set_schedule for chunk_size < 1 should use a default value (which it does). For OMP_SCHEDULE the standard says it is implementation-defined what happens if the format isn't the specified one, so I guess the env.c change could be acceptable (though without it it is fine too), but the loop.c change is wrong. Note, if the loop.c change would be ok, you'd need to also change loop_ull.c too. I've updated the patch to add the same changes for libgomp/loop_ull.c and updated the testcase too. Tested on mainline trunk without regressions. Thanks, Chung-Lin libgomp/ChangeLog: * env.c (parse_schedule): Make negative values invalid for chunk_size. * loop.c (gomp_loop_init): For non-STATIC schedule and chunk_size <= 0, set initialized chunk_size to 1. * loop_ull.c (gomp_loop_ull_init): Likewise. * testsuite/libgomp.c/loop-28.c: New test.diff --git a/libgomp/env.c b/libgomp/env.c index 1c4ee894515..dff07617e15 100644 --- a/libgomp/env.c +++ b/libgomp/env.c @@ -182,6 +182,8 @@ parse_schedule (void) goto invalid; errno = 0; + if (*env == '-') +goto invalid; value = strtoul (env, , 10); if (errno || end == env) goto invalid; diff --git a/libgomp/loop.c b/libgomp/loop.c index be85162bb1e..018b4e9a8bd 100644 --- a/libgomp/loop.c +++ b/libgomp/loop.c @@ -41,7 +41,7 @@ gomp_loop_init (struct gomp_work_share *ws, long start, long end, long incr, enum gomp_schedule_type sched, long chunk_size) { ws->sched = sched; - ws->chunk_size = chunk_size; + ws->chunk_size = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 1; /* Canonicalize loops that have zero iterations to ->next == ->end. */ ws->end = ((incr > 0 && start > end) || (incr < 0 && start < end)) ? start : end; diff --git a/libgomp/loop_ull.c b/libgomp/loop_ull.c index 602737296d4..74ddb1bd623 100644 --- a/libgomp/loop_ull.c +++ b/libgomp/loop_ull.c @@ -43,7 +43,7 @@ gomp_loop_ull_init (struct gomp_work_share *ws, bool up, gomp_ull start, gomp_ull chunk_size) { ws->sched = sched; - ws->chunk_size_ull = chunk_size; + ws->chunk_size_ull = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 1; /* Canonicalize loops that have zero iterations to ->next == ->end. */ ws->end_ull = ((up && start > end) || (!up && start < end)) ? start : end; diff --git a/libgomp/testsuite/libgomp.c/loop-28.c b/libgomp/testsuite/libgomp.c/loop-28.c new file mode 100644 index 000..664842e27aa --- /dev/null +++ b/libgomp/testsuite/libgomp.c/loop-28.c @@ -0,0 +1,21 @@ +/* { dg-do run } */ +/* { dg-timeout 10 } */ + +void __attribute__((noinline)) +foo (int a[], int n, int chunk_size) +{ + #pragma omp parallel for schedule (dynamic,chunk_size) + for (int i = 0; i < n; i++) +a[i] = i; + + #pragma omp parallel for schedule (dynamic,chunk_size) + for (unsigned long long i = 0; i < n; i++) +a[i] = i; +} + +int main (void) +{ + int a[100]; + foo (a, 100, 0); + return 0; +}
[PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
Hi Jakub, with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next: (1) chunk_size <= -1: wraps into large unsigned value, seems to work though. (2) chunk_size == 0: infinite loop The (2) behavior is obviously not desired. This patch fixes this by changing the chunk_size initialization in gomp_loop_init to "max(1,chunk_size)" The OMP_SCHEDULE parsing in libgomp/env.c has also been adjusted to reject negative values. Tested without regressions, and a new testcase for the infinite loop behavior added. Okay for trunk? Thanks, Chung-Lin libgomp/ChangeLog: * env.c (parse_schedule): Make negative values invalid for chunk_size. * loop.c (gomp_loop_init): For non-STATIC schedule and chunk_size <= 0, set initialized chunk_size to 1. * testsuite/libgomp.c/loop-28.c: New test.diff --git a/libgomp/env.c b/libgomp/env.c index 1c4ee894515..dff07617e15 100644 --- a/libgomp/env.c +++ b/libgomp/env.c @@ -182,6 +182,8 @@ parse_schedule (void) goto invalid; errno = 0; + if (*env == '-') +goto invalid; value = strtoul (env, , 10); if (errno || end == env) goto invalid; diff --git a/libgomp/loop.c b/libgomp/loop.c index be85162bb1e..018b4e9a8bd 100644 --- a/libgomp/loop.c +++ b/libgomp/loop.c @@ -41,7 +41,7 @@ gomp_loop_init (struct gomp_work_share *ws, long start, long end, long incr, enum gomp_schedule_type sched, long chunk_size) { ws->sched = sched; - ws->chunk_size = chunk_size; + ws->chunk_size = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 1; /* Canonicalize loops that have zero iterations to ->next == ->end. */ ws->end = ((incr > 0 && start > end) || (incr < 0 && start < end)) ? start : end; diff --git a/libgomp/testsuite/libgomp.c/loop-28.c b/libgomp/testsuite/libgomp.c/loop-28.c new file mode 100644 index 000..e3f852046f4 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/loop-28.c @@ -0,0 +1,17 @@ +/* { dg-do run } */ +/* { dg-timeout 10 } */ + +void __attribute__((noinline)) +foo (int a[], int n, int chunk_size) +{ + #pragma omp parallel for schedule (dynamic,chunk_size) + for (int i = 0; i < n; i++) +a[i] = i; +} + +int main (void) +{ + int a[100]; + foo (a, 100, 0); + return 0; +}
Re: [PATCH, OpenMP, v4] Implement uses_allocators clause for target regions
On 2022/6/9 8:22 PM, Jakub Jelinek wrote: + OpenMP 5.2: + + uses_allocators ( modifier : allocator-list ) Please drop the -list above. + uses_allocators ( modifier , modifier : allocator-list ) and here too. Thanks for catching. + struct item_tok + { +location_t loc; +tree id; +item_tok (void) : loc (UNKNOWN_LOCATION), id (NULL_TREE) {} + }; + struct item { item_tok name, arg; }; + auto_vec *modifiers = NULL, *allocators = NULL; + auto_vec *cur_list = new auto_vec (4); I was hoping you'd drop all this. Seehttps://gcc.gnu.org/r13-1002 for implementation (both C and C++ FE) of something very similar, the only difference there is that in the case of linear clause, it is looking for val ref uval step ( whatever ) followed by , or ) (anod ref and uval not in C FE), while you are looking for memspace ( whatever ) traits ( whatever ) followed by : or by , (in case of , repeat). But in both cases you can actually use the same parser APIs for raw token pre-parsing to just compute if it is the modifier syntax or not, set bool has_modifiers based on that (when you come over probably valid syntax followed by CPP_COLON). The linear clause doesn't have the legacy 'allocator1(t1), allocator2(t2), ...' requirement, and c_parser_omp_variable_list doesn't seem to support this pattern. Also, the way c_parser_omp_clause_linear is implemented doesn't support the requirement you mentioned earlier of allowing the use of "memspace", "traits" as the allocator name when it's actually not a modifier. I have merged the v4 patch with the syntax comments updated as above to devel/omp/gcc-11. Thanks, Chung-Lin
[PATCH, OpenMP, v4] Implement uses_allocators clause for target regions
Hi Jakub, this is v4 of the uses_allocators patch. On 2022/5/31 6:02 PM, Jakub Jelinek wrote: The response I got on omp-lang is that it is intentional that in the new syntax only a single allocator is allowed. So I'd suggest to implement: 1) if has_modifiers (i.e. certainly new syntax), only allow a single enumerator / identifier for a variable and no ()s after it 2) if !has_modifiers and there is exactly one allocator without ()s, treat it like new syntax 3) otherwise, it is the old (5.1) syntax, which allows a list and that list can contain ()s for traits, but in the light of the 5.2 wording, I'd even for that case avoid diagnosing missing traits for non-predefined allocators 4) omp_null_allocator should be diagnosed as invalid, private (omp_null_allocator) is rejected... I've adjusted the checking to enforce these rules, and updated the testcases. Re-tested without regressions. 5) for C++, we should handle FIELD_DECLs, but it shouldn't be hard, just look how it is handled for private too As discussed in the other mail, private() for FIELD_DECLs on target constructs seem not working properly, filed PR105861 for this. Currently uses_allocators (which also uses private) is still sorry() for FIELD_DECLs in this v4 patch. Will file another issue to track after patch is committed. (ChangeLog should be the same as before, so omitted here) Thanks, Chung-Lindiff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index 3a7cecdf087..be3e6ff697e 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -283,6 +283,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT32_DFLOAT32, BT_DFLOAT32, BT_DFLOAT32) DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT64_DFLOAT64, BT_DFLOAT64, BT_DFLOAT64) DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR) +DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRMODE, BT_VOID, BT_PTRMODE) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_CONST_PTR, BT_VOID, BT_CONST_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT) @@ -641,6 +642,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_PTR_SIZE_SIZE_PTRMODE, BT_PTR, BT_SIZE, BT_SIZE, BT_PTRMODE) DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_UINT8_PTRMODE, BT_VOID, BT_PTR, BT_UINT8, BT_PTRMODE) +DEF_FUNCTION_TYPE_3 (BT_FN_PTRMODE_PTRMODE_INT_PTR, BT_PTRMODE, BT_PTRMODE, +BT_INT, BT_PTR) DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR, BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR) diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc index 66d17a2673d..50db6936728 100644 --- a/gcc/c-family/c-omp.cc +++ b/gcc/c-family/c-omp.cc @@ -1873,6 +1873,7 @@ c_omp_split_clauses (location_t loc, enum tree_code code, case OMP_CLAUSE_HAS_DEVICE_ADDR: case OMP_CLAUSE_DEFAULTMAP: case OMP_CLAUSE_DEPEND: + case OMP_CLAUSE_USES_ALLOCATORS: s = C_OMP_CLAUSE_SPLIT_TARGET; break; case OMP_CLAUSE_NUM_TEAMS: diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h index 54864c2ec41..7f8944f81d6 100644 --- a/gcc/c-family/c-pragma.h +++ b/gcc/c-family/c-pragma.h @@ -154,6 +154,7 @@ enum pragma_omp_clause { PRAGMA_OMP_CLAUSE_UNTIED, PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR, PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR, + PRAGMA_OMP_CLAUSE_USES_ALLOCATORS, /* Clauses for OpenACC. */ PRAGMA_OACC_CLAUSE_ASYNC, diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 492d995a281..0fe5b7ac2e4 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -12922,6 +12922,8 @@ c_parser_omp_clause_name (c_parser *parser) result = PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR; else if (!strcmp ("use_device_ptr", p)) result = PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR; + else if (!strcmp ("uses_allocators", p)) + result = PRAGMA_OMP_CLAUSE_USES_ALLOCATORS; break; case 'v': if (!strcmp ("vector", p)) @@ -15651,6 +15653,213 @@ c_parser_omp_clause_allocate (c_parser *parser, tree list) return nl; } +/* OpenMP 5.0: + uses_allocators ( allocator-list ) + + allocator-list: + allocator + allocator , allocator-list + allocator ( traits-array ) + allocator ( traits-array ) , allocator-list + + OpenMP 5.2: + + uses_allocators ( modifier : allocator-list ) + uses_allocators ( modifier , modifier : allocator-list ) + + modifier: + traits ( traits-array ) + memspace ( mem-space-handle ) */ + +static tree +c_parser_omp_clause_uses_allocators (c_parser *parser, tree list) +{ + location_t clause_loc = c_parser_peek_token (parser)->location; + tree t = NULL_TREE, nl = list; + matching_parens parens; + if (!parens.require_open (parser)) +return list; + + tree memspace_expr = NULL_TREE; + tree traits_var = NULL_TREE; + + struct item_tok + { +location_t loc; +tree id; +item_tok (void) : loc
Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions
On 2022/6/6 9:22 下午, Jakub Jelinek wrote: On Mon, Jun 06, 2022 at 09:19:18PM +0800, Chung-Lin Tang wrote: On 2022/5/31 6:02 PM, Jakub Jelinek wrote: 5) for C++, we should handle FIELD_DECLs, but it shouldn't be hard, just look how it is handled for private too Jakub About private() for non-static members, is it really working right now? Perhaps we have a bug that we should file in bugzilla and should fix. Can you try omp parallel or omp target in the test instead? I see it works for omp parallel/task, gimplify results: void C::foo (struct C * const this) { omp_allocator_handle_t a [value-expr: ((struct C *) this)->a]; #pragma omp parallel private(a) { a = 0; } } I'll file a bugzilla for the target construct. That said, can we delay FIELD_DECL support for uses_allocators? (which is target construct only) Since it appears to be not trivial at the moment. Thanks, Chung-Lin A simple test: struct C { omp_allocator_handle_t a; void foo (void) { #pragma omp target private (a) a = (omp_allocator_handle_t) 0; } }; int main (void) { C c; c.foo (); return 0; } Jakub
Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions
On 2022/5/31 6:02 PM, Jakub Jelinek wrote: 5) for C++, we should handle FIELD_DECLs, but it shouldn't be hard, just look how it is handled for private too Jakub About private() for non-static members, is it really working right now? A simple test: struct C { omp_allocator_handle_t a; void foo (void) { #pragma omp target private (a) a = (omp_allocator_handle_t) 0; } }; int main (void) { C c; c.foo (); return 0; } After C++ front-end processing we get: { omp_allocator_handle_t D.2823 [value-expr: ((struct C *) this)->a]; #pragma omp target private(D.2823) { { <; } } } The OMP field privatization seems to be doing something here. However gimplify turns this into: void C::foo (struct C * const this) { omp_allocator_handle_t a [value-expr: ((struct C *) this)->a]; #pragma omp target num_teams(1) thread_limit(0) private(a) \ map(alloc:MEM[(char *)this] [len: 0]) map(firstprivate:this [pointer assign, bias: 0]) { this->a = 0; } } This doesn't look quite right for private clause. I don't quite expect a zero-length mapping of this[:0], nor reverting the gimple to use "this->a" for a private copy. Chung-Lin
Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions
Hi Jakub, this is v3 of the uses_allocators patch. On 2022/5/20 1:46 AM, Jakub Jelinek wrote: On Tue, May 10, 2022 at 07:29:23PM +0800, Chung-Lin Tang wrote: @@ -15624,6 +15626,233 @@ c_parser_omp_clause_allocate (c_parser *parser, tree list) return nl; } +/* OpenMP 5.2: + uses_allocators ( allocator-list ) As uses_allocators is a 5.0 feature already, the above should say /* OpenMP 5.0: + + allocator-list: + allocator + allocator , allocator-list + allocator ( traits-array ) + allocator ( traits-array ) , allocator-list + And here it should add OpenMP 5.2: Done. + if (c_parser_next_token_is (parser, CPP_NAME)) +{ + c_token *tok = c_parser_peek_token (parser); + const char *p = IDENTIFIER_POINTER (tok->value); + + if (strcmp ("traits", p) == 0 || strcmp ("memspace", p) == 0) + { + has_modifiers = true; + c_parser_consume_token (parser); + matching_parens parens2;; Double ;;, should be just ; But more importantly, it is more complex. When you see uses_allocators(traits or uses_allocators(memspace it is not given that it has modifiers. While the 5.0/5.1 syntax had a restriction that when allocator is not a predefined allocator (and traits or memspace aren't predefined allocators) it must use ()s with traits, so uses_allocators(traits) uses_allocators(memspace) uses_allocators(traits,memspace) are all invalid, omp_allocator_handle_t traits; uses_allocators(traits(mytraits)) or omp_allocator_handle_t memspace; uses_allocators(memspace(mytraits),omp_default_mem_alloc) are valid in the old syntax. So, I'm afraid to find out if the traits or memspace identifier seen after uses_allocator ( are modifiers or not we need to peek (for C with c_parser_peek_nth_token_raw) through all the modifiers whether we see a : and only in that case say they are modifiers rather than the old style syntax. The parser parts have been rewritten to allow this kind of use now. New code essentially parses lists of "id(id), id(id), ...", possibly delimited by a ':' marking the modifier/allocator lists. I don't really like the modifiers handling not done in a loop. As I said above, there needs to be some check whether there are modifiers or not, but once we figure out there are modifiers, it should be done in a loop with say some mask var on which traits have been already handled to diagnose duplicates, we don't want to do the parsing code twice. Now everything is done in loops. The new code should be considerably simpler now. This feels like you only accept a single allocator in the new syntax, but that isn't my reading of the spec, I'd understand it as: uses_allocators (memspace(omp_high_bw_mem_space), traits(foo_traits) : bar, baz, qux) being valid too. This patch now allows multiple allocators to be specified in new syntax, although I have to note that the 5.2 specification of uses_allocators (page 181) specifically says "allocator: expression of allocator_handle_type" for the "Arguments" description, not a "list" like the allocate clause. + case OMP_CLAUSE_USES_ALLOCATORS: + t = OMP_CLAUSE_USES_ALLOCATORS_ALLOCATOR (c); + if (bitmap_bit_p (_head, DECL_UID (t)) + || bitmap_bit_p (_head, DECL_UID (t)) + || bitmap_bit_p (_head, DECL_UID (t)) + || bitmap_bit_p (_head, DECL_UID (t))) You can't just use DECL_UID before you actually verify it is a variable. So IMHO this particular if should be moved down somewhat. Guarded now. + { + error_at (OMP_CLAUSE_LOCATION (c), + "%qE appears more than once in data clauses", t); + remove = true; + } + else + bitmap_set_bit (_head, DECL_UID (t)); + if (TREE_CODE (TREE_TYPE (t)) != ENUMERAL_TYPE + || strcmp (IDENTIFIER_POINTER (TYPE_IDENTIFIER (TREE_TYPE (t))), +"omp_allocator_handle_t") != 0) + { + error_at (OMP_CLAUSE_LOCATION (c), + "allocator must be of % type"); + remove = true; + } I'd add break; after remove = true; Added some such breaks. + if (TREE_CODE (t) == CONST_DECL) + { + if (OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE (c) + || OMP_CLAUSE_USES_ALLOCATORS_TRAITS (c)) + error_at (OMP_CLAUSE_LOCATION (c), + "modifiers cannot be used with pre-defined " + "allocators"); + + /* Currently for pre-defined allocators in libgomp, we do not +require additional init/fini inside target regions, so discard +such clauses. */ + remove = true; + } It should be only removed if we emit the error (again with break; too). IMHO (see the other mail) we should
[PATCH, OpenMP, v2] Implement uses_allocators clause for target regions
On 2022/5/7 12:40 AM, Tobias Burnus wrote: Can please also handle the new clause in Fortran's dump-parse-tree.cc? I did see some split handling in C, but not in Fortran; do you also need to up update gfc_split_omp_clauses in Fortran's trans-openmp.cc? Done. Actually, glancing at the testcases, no combined construct (like "omp target parallel") is used, I think that would be useful because of ↑. Okay, added some to testcases. +/* OpenMP 5.2: + uses_allocators ( allocator-list ) That's not completely true: uses_allocators is OpenMP 5.1. However, 5.1 only supports (for non-predefined allocators): uses_allocators( allocator(traits) ) while OpenMP 5.2 added modifiers: uses_allocatrors( traits(...), memspace(...) : allocator ) and deprecated the 5.1 'allocator(traits)'. (Scheduled for removal in OMP 6.0) The advantage of 5.2 syntax is that a memory space can be defined. I supported both syntaxes, that's why I designated it as "5.2". BTW: This makes uses_allocators the first OpenMP 5.2 feature which will make it into GCC :-) :) gcc/fortran/openmp.cc: + if (gfc_get_symbol ("omp_allocator_handle_kind", NULL, ) + || !sym->value + || sym->value->expr_type != EXPR_CONSTANT + || sym->value->ts.type != BT_INTEGER) + { + gfc_error ("OpenMP % constant not found by " + "% clause at %C"); + goto error; + } + allocator_handle_kind = sym; I think you rather want to use gfc_find_symbol ("omp_...", NULL, true, ) || sym == NULL where true is for parent_flag to search also the parent namespace. (The function returns 1 if the symbol is ambiguous, 0 otherwise - including 0 + sym == NULL when the symbol could not be found.) || sym->attr.flavor != FL_PARAMETER || sym->ts.type != BT_INTEGER || sym->attr.dimension Looks cleaner than to access sym->value. The attr.dimension is just to makes sure the user did not smuggle an array into this. (Invalid as omp_... is a reserved namespace but users will still do this and some are good in finding ICE as hobby.) Well, the intention here is to search for "omp_allocator_handle_kind" and "omp_memspace_handle_kind", and use their value to check if the kinds are the same as declared allocator handles and memspace constant. Not to generally search for "omp_...". However the sym->attr.dimension test seems useful, added in new v2 patch. However, I fear that will fail for the following two examples (both untested): use omp_lib, my_kind = omp_allocator_handle_kind integer(my_kind) :: my_allocator as this gives 'my_kind' in the symtree->name (while symtree->n.sym->name is "omp_..."). Hence, by searching the symtree for 'omp_...' the symbol will not be found. It will likely also fail for the following more realistic example: ... subroutine foo use m use omp_lib, only: omp_alloctrait ... !$omp target uses_allocators(my_allocator(traits_array) allocate(my_allocator:A) firstprivate(A) ... !$omp end target end If someone wants to use OpenMP allocators, but intentionally only imports insufficient standard symbols from omp_lib, then he/she is on their own :) The specification really makes this quite clear: omp_allocator_handle_kind, omp_alloctrait, omp_memspace_handle_kind are all part of the same package. In this case, omp_allocator_handle_kind is not in the namespace of 'foo' but the code should be still valid. Thus, an alternative would be to hard-code the value - as done for the depobj. As we have: integer, parameter :: omp_allocator_handle_kind = c_intptr_t integer, parameter :: omp_memspace_handle_kind = c_intptr_t that would be sym->ts.type == BT_CHARACTER sym->ts.kind == gfc_index_integer_kind for the allocator variable and the the memspace kind. However, I grant that either example is not very typical. The second one is more natural – such a code will very likely be written in the real world. But not with uses_allocators but rather with "!$omp requires dynamic_allocators" and omp_init_allocator(). Thoughts? As above. I mean, what is so hard with including "use omp_lib" where you need it? :D * * * gcc/fortran/openmp.cc + if (++i > 2) + { + gfc_error ("Only two modifiers are allowed on % " + "clause at %C"); + goto error; + } + Is this really needed? There is a check for multiple traits and multiple memspace Thus, 'trait(),memspace(),trait()' is already handled and 'trait(),something' give a break and will lead to an error as in that case a ':' and not ',something' is expected. I think it could be worth reminding that limitation, instead of a generic error. + if (gfc_match_char ('(') == MATCH_YES) + { + if (memspace_seen || traits_seen) + { + gfc_error ("Modifiers cannot be used with legacy " + "array syntax at %C"); I wouldn't uses the term 'array synax' to denote uses_allocators(allocator (alloc_array) ) How about: error: "Using both
[PATCH, OpenMP] Implement uses_allocators clause for target regions
Hi Jakub, this patch implements the uses_allocators clause for OpenMP target regions. For user defined allocator handles, this allows target regions to assign memory space and traits to allocators, and automatically calls omp_init/destroy_allocator() in the beginning/end of the target region. For pre-defined allocators (i.e. omp_..._mem_alloc names), this is a no-op, such clauses are not created. Asides from the front-end portions, the target region transforms are done in gimplify_omp_workshare. This patch also includes added changes to enforce the "allocate allocator must be also in a uses_allocator clause", as to mentioned in[1]. This is done during gimplify_scan_omp_clauses. [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594039.html Tested on mainline, please see if this is okay. Thanks, Chung-Lin 2022-05-06 Chung-Lin Tang gcc/c-family/ChangeLog: * c-omp.cc (c_omp_split_clauses): Add OMP_CLAUSE_USES_ALLOCATORS case. * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_name): Add case for uses_allocators clause. (c_parser_omp_clause_uses_allocators): New function. (c_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case. (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask. * c-typeck.cc (c_finish_omp_clauses): Add case handling for OMP_CLAUSE_USES_ALLOCATORS. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_name): Add case for uses_allocators clause. (cp_parser_omp_clause_uses_allocators): New function. (cp_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case. (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask. * semantics.cc (finish_omp_clauses): Add case handling for OMP_CLAUSE_USES_ALLOCATORS. fortran/ChangeLog: * gfortran.h (struct gfc_omp_namelist): Add memspace_sym, traits_sym fields. (OMP_LIST_USES_ALLOCATORS): New list enum. * openmp.cc (enum omp_mask2): Add OMP_CLAUSE_USES_ALLOCATORS. (gfc_match_omp_clause_uses_allocators): New function. (gfc_match_omp_clauses): Add case to handle OMP_CLAUSE_USES_ALLOCATORS. (OMP_TARGET_CLAUSES): Add OMP_CLAUSE_USES_ALLOCATORS. (resolve_omp_clauses): Add "USES_ALLOCATORS" to clause_names[]. * trans-array.cc (gfc_conv_array_initializer): Adjust array index to always be a created tree expression instead of NULL_TREE when zero. * trans-openmp.cc (gfc_trans_omp_clauses): For ALLOCATE clause, handle using gfc_trans_omp_variable for EXPR_VARIABLE exprs. Add handling of OMP_LIST_USES_ALLOCATORS case. * types.def (BT_FN_VOID_PTRMODE): Define. (BT_FN_PTRMODE_PTRMODE_INT_PTR): Define. gcc/ChangeLog: * builtin-types.def (BT_FN_VOID_PTRMODE): Define. (BT_FN_PTRMODE_PTRMODE_INT_PTR): Define. * omp-builtins.def (BUILT_IN_OMP_INIT_ALLOCATOR): Define. (BUILT_IN_OMP_DESTROY_ALLOCATOR): Define. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_USES_ALLOCATORS. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_USES_ALLOCATORS. * tree.h (OMP_CLAUSE_USES_ALLOCATORS_ALLOCATOR): New macro. (OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE): New macro. (OMP_CLAUSE_USES_ALLOCATORS_TRAITS): New macro. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_USES_ALLOCATORS. (omp_clause_code_name): Add "uses_allocators". * gimplify.cc (gimplify_scan_omp_clauses): Add checking of OpenMP target region allocate clauses, to require a uses_allocators clause to exist for allocators. (gimplify_omp_workshare): Add handling of OMP_CLAUSE_USES_ALLOCATORS for OpenMP target regions; create calls of omp_init/destroy_allocator around target region body. gcc/testsuite/ChangeLog: * c-c++-common/gomp/uses_allocators-1.c: New test. * c-c++-common/gomp/uses_allocators-2.c: New test. * gfortran.dg/gomp/uses_allocators-1.f90: New test. * gfortran.dg/gomp/uses_allocators-2.f90: New test. * gfortran.dg/gomp/uses_allocators-3.f90: New test. diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index 3a7cecdf087..be3e6ff697e 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -283,6 +283,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT32_DFLOAT32, BT_DFLOAT32, BT_DFLOAT32) DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT64_DFLOAT64, BT_DFLOAT64, BT_DFLOAT64) DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR) +DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRMODE, BT_VOID, BT_PTRMODE) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_CONST_PTR, BT_VOID, BT_CONST_PTR) DEF_FU
[PATCH, OpenMP] Fix nested use_device_ptr
Hi Jakub, this patch fixes a bug in lower_omp_target, where for Fortran arrays, the expanded sender assignment is wrongly using the variable in the current ctx, instead of the one looked-up outside, which is causing use_device_ptr/addr to fail to work when used inside an omp-parallel (where the omp child_fn is split away from the original). Just a one-character change to fix this. The fix is inside omp-low.cc, though because the omp_array_data langhook is used only by Fortran, this is essentially Fortran-specific. Tested on x86_64-linux + nvptx offloading without regressions. This is probably not a regression, but seeking to commit when stage1 opens. Thanks, Chung-Lin 2022-04-01 Chung-Lin Tang gcc/ChangeLog: * omp-low.cc (lower_omp_target): Use outer context looked-up 'var' as argument to lang_hooks.decls.omp_array_data, instead of 'ovar' from current clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/use_device_ptr-4.f90: New testcase. diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index 392bb18..bf5779b 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -13405,7 +13405,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) type = TREE_TYPE (ovar); if (lang_hooks.decls.omp_array_data (ovar, true)) - var = lang_hooks.decls.omp_array_data (ovar, false); + var = lang_hooks.decls.omp_array_data (var, false); else if (((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_ADDR || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR) && !omp_privatize_by_reference (ovar) diff --git a/libgomp/testsuite/libgomp.fortran/use_device_ptr-4.f90 b/libgomp/testsuite/libgomp.fortran/use_device_ptr-4.f90 new file mode 100644 index 000..8c361d1 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/use_device_ptr-4.f90 @@ -0,0 +1,41 @@ +! { dg-do run } +! +! Test user_device_ptr nested within another parallel +! construct +! +program test_nested_use_device_ptr + use iso_c_binding, only: c_loc, c_ptr + implicit none + real, allocatable, target :: arr(:,:) + integer :: width = 1024, height = 1024, i + type(c_ptr) :: devptr + + allocate(arr(width,height)) + + !$omp target enter data map(alloc: arr) + + !$omp target data use_device_ptr(arr) + devptr = c_loc(arr(1,1)) + !$omp end target data + + !$omp parallel default(none) shared(arr, devptr) + !$omp single + + !$omp target data use_device_ptr(arr) + call thing(c_loc(arr), devptr) + !$omp end target data + + !$omp end single + !$omp end parallel + !$omp target exit data map(delete: arr) + +contains + + subroutine thing(myarr, devptr) +use iso_c_binding, only: c_ptr, c_associated +implicit none +type(c_ptr) :: myarr, devptr +if (.not.c_associated(myarr, devptr)) stop 1 + end subroutine thing + +end program
[RFC][PATCH, OpenMP/OpenACC, libgomp] Allow base-pointers to be NULL
Hi all, when troubleshooting building/running SPEC HPC 2021 with GCC with OpenMP offloading, specifically 534.hpgmgfv_t, an issue encountered in the benchmark was: when the benchmark was initializing and creating its data environment on the GPU, it was trying to map array sections where the base-pointer is actually NULL: ... for (block=0;block<3;++block) { #pragma omp target enter data map(to:level->restriction[shape].blocks[block][:length]) // level->restriction[shape].blocks[block] == NULL for some values of index 'block' ... The benchmark appears to be assuming that such NULL base-pointers would simply be silently ignored, and the program would just keep running. (BTW, the above case needs this patch to compile: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590658.html which is still awaiting review :) ) What we currently do in libgomp, however, is that we issue an error and call gomp_fatal(): libgomp/target.c:gomp_attach_pointer(): ... if ((void *) target == NULL) { - gomp_mutex_unlock (>lock); - gomp_fatal ("attempt to attach null pointer"); + n->aux->attach_count[idx] = 0; // proposed change attached in patch + return; ... Some quick testing shows that clang/LLVM behaves mostly the same as GCC. OTOH, nVidia HPC SDK (PGI) does appear to silently go on without bailing out. (I have not verified if 534.hpgmgfv_t fully works with PGI, just observed how their runtime handles NULL base-pointers) I don't see any explicit description of this case in the OpenMP specifications, just simply "The corresponding pointer variable becomes an attached pointer", lack of description on how this is to be handled. So WDYGT? Should libgomp behavior be adjusted here, or should SPEC benchmark source be adjusted? (The attached patch to adjust libgomp attach behavior has been regtested without regressions, FWIW) Thanks, Chung-Lin 2022-03-09 Chung-Lin Tang libgomp/ChangeLog: * target.c (gomp_attach_pointer): When pointer is NULL, return instead of calling gomp_fatal. diff --git a/libgomp/target.c b/libgomp/target.c index 9017458885e..0e8bbd83c20 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -796,8 +796,8 @@ gomp_attach_pointer (struct gomp_device_descr *devicep, if ((void *) target == NULL) { - gomp_mutex_unlock (>lock); - gomp_fatal ("attempt to attach null pointer"); + n->aux->attach_count[idx] = 0; + return; } s.host_start = target + bias;
[PATCH, OpenMP, C++] Allow classes with static members to be mappable
Hi Jakub, Now in OpenMP 5.x, static members are supposed to be not a barrier for a class to be target-mapped. There is the related issue of actually providing access to static const/constexpr members on the GPU (probably a case of https://github.com/OpenMP/spec/issues/2158) but that is for later. This patch basically just removes the check for static members inside cp_omp_mappable_type_1, and adjusts a testcase. Not sure if more tests are needed. Tested on trunk without regressions, okay when stage1 reopens? Thanks, Chung-Lin 2022-03-09 Chung-Lin Tang gcc/cp/ChangeLog: * decl2.cc (cp_omp_mappable_type_1): Remove requirement that all members must be non-static; remove check for static fields. gcc/testsuite/ChangeLog: * g++.dg/gomp/unmappable-1.C: Adjust testcase.diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc index c53acf4546d..ace7783d9bd 100644 --- a/gcc/cp/decl2.cc +++ b/gcc/cp/decl2.cc @@ -1544,21 +1544,14 @@ cp_omp_mappable_type_1 (tree type, bool notes) /* Arrays have mappable type if the elements have mappable type. */ while (TREE_CODE (type) == ARRAY_TYPE) type = TREE_TYPE (type); - /* All data members must be non-static. */ + if (CLASS_TYPE_P (type)) { tree field; for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) - if (VAR_P (field)) - { - if (notes) - inform (DECL_SOURCE_LOCATION (field), - "static field %qD is not mappable", field); - result = false; - } /* All fields must have mappable types. */ - else if (TREE_CODE (field) == FIELD_DECL -&& !cp_omp_mappable_type_1 (TREE_TYPE (field), notes)) + if (TREE_CODE (field) == FIELD_DECL + && !cp_omp_mappable_type_1 (TREE_TYPE (field), notes)) result = false; } return result; diff --git a/gcc/testsuite/g++.dg/gomp/unmappable-1.C b/gcc/testsuite/g++.dg/gomp/unmappable-1.C index 364f884500c..1532b9c73f1 100644 --- a/gcc/testsuite/g++.dg/gomp/unmappable-1.C +++ b/gcc/testsuite/g++.dg/gomp/unmappable-1.C @@ -4,7 +4,7 @@ class C { public: - static int static_member; /* { dg-message "static field .C::static_member. is not mappable" } */ + static int static_member; virtual void f() {} };
[PATCH, OpenMP, C/C++] Handle array reference base-pointers in array sections
Hi Jakub, as encountered in cases where a program constructs its own deep-copying for arrays-of-pointers, e.g: #pragma omp target enter data map(to:level->vectors[:N]) for (i = 0; i < N; i++) #pragma omp target enter data map(to:level->vectors[i][:N]) We need to treat the part of the array reference before the array section as a base-pointer (here 'level->vectors[i]'), providing pointer-attachment behavior. This patch adds this inside handle_omp_array_sections(), tracing the whole sequence of array dimensions, creating a whole base-pointer reference iteratively using build_array_ref(). The conditions are that each of the "absorbed" dimensions must be length==1, and the final reference must be of pointer-type (so that pointer attachment makes sense). There's also a little patch in gimplify_scan_omp_clauses(), to make sure the array-ref base-pointer goes down the right path. This case was encountered when working to make 534.hpgmgfv_t from SPEChpc 2021 properly compile. Tested without regressions on trunk. Okay to go in once stage1 opens? Thanks, Chung-Lin 2022-02-21 Chung-Lin Tang gcc/c/ChangeLog: * c-typeck.cc (handle_omp_array_sections): Add handling for creating array-reference base-pointer attachment clause. gcc/cp/ChangeLog: * semantics.cc (handle_omp_array_sections): Add handling for creating array-reference base-pointer attachment clause. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Add case for attach/detach map kind for ARRAY_REF of POINTER_TYPE. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-enter-data-1.c: Adjust testcase. libgomp/testsuite/ChangeLog: * libgomp.c-c++-common/ptr-attach-2.c: New test.diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index 3075c883548..4257e373557 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -13649,6 +13649,10 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) if (int_size_in_bytes (TREE_TYPE (first)) <= 0) maybe_zero_len = true; + struct dim { tree low_bound, length; }; + auto_vec dims (num); + dims.safe_grow (num); + for (i = num, t = OMP_CLAUSE_DECL (c); i > 0; t = TREE_CHAIN (t)) { @@ -13763,6 +13767,9 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) else size = size_binop (MULT_EXPR, size, l); } + + dim d = { low_bound, length }; + dims[i] = d; } if (side_effects) size = build2 (COMPOUND_EXPR, sizetype, side_effects, size); @@ -13802,6 +13809,23 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) OMP_CLAUSE_DECL (c) = t; return false; } + + tree aref = t; + for (i = 0; i < dims.length (); i++) + { + if (dims[i].length && integer_onep (dims[i].length)) + { + tree lb = dims[i].low_bound; + aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb); + } + else + { + if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE) + t = aref; + break; + } + } + first = c_fully_fold (first, false, NULL); OMP_CLAUSE_DECL (c) = first; if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR) @@ -13836,7 +13860,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) break; } tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); - if (TREE_CODE (t) == COMPONENT_REF) + if (TREE_CODE (t) == COMPONENT_REF || TREE_CODE (t) == ARRAY_REF + || TREE_CODE (t) == INDIRECT_REF) OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH); else OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER); diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 0cb17a6a8ab..646f4883d66 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -5497,6 +5497,10 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) if (processing_template_decl && maybe_zero_len) return false; + struct dim { tree low_bound, length; }; + auto_vec dims (num); + dims.safe_grow (num); + for (i = num, t = OMP_CLAUSE_DECL (c); i > 0; t = TREE_CHAIN (t)) { @@ -5604,6 +5608,9 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) else size = size_binop (MULT_EXPR, size, l); } + + dim d = { low_bound, length }; + dims[i] = d; } if (!processing_template_decl) { @@ -5647,6 +5654,24 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) OMP_CLAUSE_DECL (c) = t; return false; } + + tree aref = t; + for (i = 0; i < dims.length (); i++) + { +
Re: [PATCH, OpenMP] PR103642 - Fix omp-low ICE for indirect references based off component access
Ping. On 2022/1/3 10:15 PM, Chung-Lin Tang wrote: This issue was triggered after the patch extending syntax for component access in map clauses (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0ab29cf0bb68960c) In gimplify_scan_omp_clauses, the case for handling indirect accesses (which creates firstprivate ptr and zero-length array section map for such decls) was erroneously went into for non-pointer cases (here being the base struct decl), so added the appropriate checks there. Added new testcase is a compile only test for the ICE. The original omptests t-partial-struct test actually should not execute correctly, because for map(t.s->a[:N]), map(t.s[:1]) is not implicitly mapped, thus the entire offloaded access does not work as is. (fixing that omptests test is out of scope here) Tested without regressions, okay for trunk? Thanks, Chung-Lin 2022-01-03 Chung-Lin Tang gcc/ChangeLog: PR middle-end/103642 * gimplify.c (gimplify_scan_omp_clauses): Do not do indir_p handling for non-pointer or non-reference-to-pointer cases. gcc/testsuite/ChangeLog: * c-c++-common/gomp/pr103642.c: New test.
Re: [PATCH, OpenMP, C/C++] Fix PR103705
Forgot to attach the patch, here it is :P On 2022/1/10 10:59 PM, Chung-Lin Tang wrote: For cases like: #pragma omp target update from(s[0].a[0:1]) The handling in [c_]finish_omp_clauses was only peeling off ARRAY_REF once before the loop handling COMPONENT_REF, and snagged when the base of the component_ref is an array access. This adds the handling there for both C and C++ front-ends. (ICE started to happen after https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6c0399378e77d029 where map/from/to clause syntax was relaxed to allow more stuff) Tested without regressions, okay to commit? Thanks, Chung-Lin PR c++/103705 gcc/c/ChangeLog: * c-typeck.c (c_finish_omp_clauses): Also continue peeling off of outer node for ARRAY_REFs. gcc/cp/ChangeLog: * semantics.c (finish_omp_clauses): Also continue peeling off of outer node for ARRAY_REFs. gcc/testsuite/ChangeLog: * c-c++-common/gomp/pr103705.c: New test.diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 8b492cf5bed..ac6618eca5c 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -14929,7 +14929,8 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) t = TREE_OPERAND (t, 0); } } - while (TREE_CODE (t) == COMPONENT_REF); + while (TREE_CODE (t) == COMPONENT_REF +|| TREE_CODE (t) == ARRAY_REF); if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP && OMP_CLAUSE_MAP_IMPLICIT (c) diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 645654768e3..a7435ed1266 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -7931,7 +7931,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) t = TREE_OPERAND (t, 0); } } - while (TREE_CODE (t) == COMPONENT_REF); + while (TREE_CODE (t) == COMPONENT_REF +|| TREE_CODE (t) == ARRAY_REF); if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP && OMP_CLAUSE_MAP_IMPLICIT (c) diff --git a/gcc/testsuite/c-c++-common/gomp/pr103705.c b/gcc/testsuite/c-c++-common/gomp/pr103705.c new file mode 100644 index 000..bf4c7066d28 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/pr103705.c @@ -0,0 +1,14 @@ +/* PR c++/103705 */ +/* { dg-do compile } */ + +struct S +{ + int a[2]; +}; + +int main (void) +{ + struct S s[1]; + #pragma omp target update from(s[0].a[0:1]) + return 0; +}
[PATCH, OpenMP, C/C++] Fix PR103705
For cases like: #pragma omp target update from(s[0].a[0:1]) The handling in [c_]finish_omp_clauses was only peeling off ARRAY_REF once before the loop handling COMPONENT_REF, and snagged when the base of the component_ref is an array access. This adds the handling there for both C and C++ front-ends. (ICE started to happen after https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6c0399378e77d029 where map/from/to clause syntax was relaxed to allow more stuff) Tested without regressions, okay to commit? Thanks, Chung-Lin PR c++/103705 gcc/c/ChangeLog: * c-typeck.c (c_finish_omp_clauses): Also continue peeling off of outer node for ARRAY_REFs. gcc/cp/ChangeLog: * semantics.c (finish_omp_clauses): Also continue peeling off of outer node for ARRAY_REFs. gcc/testsuite/ChangeLog: * c-c++-common/gomp/pr103705.c: New test.
[PATCH, OpenMP, libgomp, committed] Fix GOMP_DEVICE_NUM_VAR stringification error
In the patch that implemented omp_get_device_num(), there was an error where the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to the actual symbol used, was erroneously using the STRINGX() macro in the libgomp offload image symbol search, and expansion of the variable name string through the additional layer of preprocessor symbol was not properly achieved. This patch fixes this by changing to properly use XSTRING(), also from include/symcat.h. This change was fairly obvious, so committed directly. Thanks, Chung-Lin libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. From fbb592407c9dd244b4cea086cbb90d7bd0bf60bb Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Tue, 4 Jan 2022 17:26:23 +0800 Subject: [PATCH] libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load In the patch that implemented omp_get_device_num(), there was an error where the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to the actual symbol used, was erroneously using the STRINGX() macro in the libgomp offload image symbol search, and expansion of the variable name string through the additional layer of preprocessor symbol was not properly achieved. This patch fixes this by changing to properly use XSTRING(), also from include/symcat.h. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. --- libgomp/plugin/plugin-gcn.c | 4 ++-- libgomp/plugin/plugin-nvptx.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 8ffd3d1a2cf..d0f05b28bf3 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3401,12 +3401,12 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, } } - GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR)); + GCN_DEBUG ("Looking for variable %s\n", XSTRING (GOMP_DEVICE_NUM_VAR)); hsa_status_t status; hsa_executable_symbol_t var_symbol; status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL, -STRINGX (GOMP_DEVICE_NUM_VAR), +XSTRING (GOMP_DEVICE_NUM_VAR), agent->id, 0, _symbol); if (status == HSA_STATUS_SUCCESS) { diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index f32276b0a18..b4f0a84d77a 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1353,7 +1353,7 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, size_t device_num_varsize; CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, _num_varptr, _num_varsize, module, - STRINGX (GOMP_DEVICE_NUM_VAR)); + XSTRING (GOMP_DEVICE_NUM_VAR)); if (r == CUDA_SUCCESS) { targ_tbl->start = (uintptr_t) device_num_varptr; -- 2.17.1
[PATCH, OpenMP, Fortran] PR103643: ICE in gimplify_omp_affinity
After the PR90030 patch, which removes the universal casting of all Fortran array pointers to 'c_char*', a Fortran descriptor based array passed into an affinity() clause now looks like: - #pragma omp task private(i) shared(b) affinity(*(c_char *) a.data) + #pragma omp task private(i) shared(b) affinity(*(integer(kind=4)[0:] * restrict) a.data) The 'integer(kind=4)[0:]' incomplete type appears to be causing ICE during gimplify_expr() due to is_gimple_val, fb_rvalue. The ICE appears to be fixed just by adjusting to 'is_gimple_lvalue, fb_lvalue'. Considering the use of the affinity() clause, which should be specifying the location of a particular object in memory, this probably makes sense. Tested without regressions, seeking approval for trunk. Thanks, Chung-Lin 2022-01-03 Chung-Lin Tang gcc/ChangeLog: PR middle-end/103643 * gimplify.c (gimplify_omp_affinity): Adjust gimplify_expr of entire OMP_CLAUSE_DECL to use 'is_gimple_lvalue, fb_lvalue' gcc/testsuite/ChangeLog: * gfortran.dg/gomp/pr103643.f90: New test.diff --git a/gcc/gimplify.c b/gcc/gimplify.c index b118c72f62c..87cc01483dd 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8123,7 +8123,7 @@ gimplify_omp_affinity (tree *list_p, gimple_seq *pre_p) if (error_operand_p (OMP_CLAUSE_DECL (c))) return; if (gimplify_expr (_CLAUSE_DECL (c), pre_p, NULL, - is_gimple_val, fb_rvalue) == GS_ERROR) + is_gimple_lvalue, fb_lvalue) == GS_ERROR) return; gimplify_and_add (OMP_CLAUSE_DECL (c), pre_p); } diff --git a/gcc/testsuite/gfortran.dg/gomp/pr103643.f90 b/gcc/testsuite/gfortran.dg/gomp/pr103643.f90 new file mode 100644 index 000..3b409f5f858 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/pr103643.f90 @@ -0,0 +1,19 @@ +! PR middle-end/103643 +! { dg-do compile } + +program test_task_affinity + implicit none + integer i + integer, allocatable :: A(:) + + allocate (A(10)) + + !$omp target + !$omp task affinity(A) + do i = 1, 10 + A(i) = 0 + end do + !$omp end task + !$omp end target + +end program test_task_affinity
[PATCH, OpenMP] PR103642 - Fix omp-low ICE for indirect references based off component access
This issue was triggered after the patch extending syntax for component access in map clauses (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0ab29cf0bb68960c) In gimplify_scan_omp_clauses, the case for handling indirect accesses (which creates firstprivate ptr and zero-length array section map for such decls) was erroneously went into for non-pointer cases (here being the base struct decl), so added the appropriate checks there. Added new testcase is a compile only test for the ICE. The original omptests t-partial-struct test actually should not execute correctly, because for map(t.s->a[:N]), map(t.s[:1]) is not implicitly mapped, thus the entire offloaded access does not work as is. (fixing that omptests test is out of scope here) Tested without regressions, okay for trunk? Thanks, Chung-Lin 2022-01-03 Chung-Lin Tang gcc/ChangeLog: PR middle-end/103642 * gimplify.c (gimplify_scan_omp_clauses): Do not do indir_p handling for non-pointer or non-reference-to-pointer cases. gcc/testsuite/ChangeLog: * c-c++-common/gomp/pr103642.c: New test. diff --git a/gcc/gimplify.c b/gcc/gimplify.c index b118c72f62c..bdc8189c2a7 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -9543,7 +9543,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, == REFERENCE_TYPE)) decl = TREE_OPERAND (decl, 0); } - if (decl != orig_decl && DECL_P (decl) && indir_p) + if (decl != orig_decl && DECL_P (decl) && indir_p + && (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE + || (decl_ref + && TREE_CODE (TREE_TYPE (decl_ref)) == POINTER_TYPE))) { gomp_map_kind k = ((code == OACC_EXIT_DATA || code == OMP_TARGET_EXIT_DATA) diff --git a/gcc/testsuite/c-c++-common/gomp/pr103642.c b/gcc/testsuite/c-c++-common/gomp/pr103642.c new file mode 100644 index 000..c5451596b69 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/pr103642.c @@ -0,0 +1,31 @@ +/* PR middle-end/103642 */ +/* { dg-do compile } */ + +#include + +typedef struct +{ + int *a; +} S; + +typedef struct +{ + S *s; + int *ptr; +} T; + +#define N 10 + +int main (void) +{ + T t; + t.s = (S *) malloc (sizeof (S)); + t.s->a = (int *) malloc (sizeof(int) * N); + + #pragma omp target map(from: t.s->a[:N]) + { +t.s->a[0] = 1; + } + + return 0; +}
Re: [PATCH, v5, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v5]
On 2021/12/4 12:47 AM, Jakub Jelinek wrote: On Tue, Nov 16, 2021 at 08:43:27PM +0800, Chung-Lin Tang wrote: 2021-11-16 Chung-Lin Tang PR middle-end/92120 gcc/cp/ChangeLog: ... + if (allow_zero_length_array_sections) + { + /* When allowing attachment to zero-length array sections, we +allow attaching to NULL pointers when the target region is not +mapped. */ + data = 0; + } No {}s around single statement if body. Otherwise LGTM. Jakub Thanks for the review and approval, Jakub. Thomas, I pushed another 2766448c5cc3efc4 commit to fix the non-offload config FAILs, just FYI. Chung-Lin
[PATCH, Fortran] Fix setting of array lower bound for named arrays
This patch by Tobias, fixes a case of setting array low-bounds, found for particular uses of SOURCE=/MOLD=. For example: program A_M implicit none real, dimension (:), allocatable :: A, B allocate (A(0:5)) call Init (A) contains subroutine Init ( A ) real, dimension ( 0 : ), intent ( in ) :: A integer, dimension ( 1 ) :: lb_B allocate (B, mold = A) ... lb_B = lbound (B, dim=1) ! Error: lb_B assigned 1, instead of 0 like lower-bound of A. Referencing the Fortran standard: "16.9.109 LBOUND (ARRAY [, DIM, KIND])" states: "If DIM is present, ARRAY is a whole array, and either ARRAY is an assumed-size array of rank DIM or dimension DIM of ARRAY has nonzero extent, the result has a value equal to the lower bound for subscript DIM of ARRAY. Otherwise, if DIM is present, the result value is 1." And on what is a "whole array": "9.5.2 Whole arrays" "A whole array is a named array or a structure component ..." The attached patch adjusts the relevant part in gfc_trans_allocate() to only set e3_has_nodescriptor only for non-named arrays. Tobias has tested this once, and I've tested this patch as well on our complete set of testsuites (which usually serves for OpenMP related stuff). Everything appears well with no regressions. Is this okay for trunk? Thanks, Chung-Lin 2021-11-29 Tobias Burnus gcc/fortran/ChangeLog: * trans-stmt.c (gfc_trans_allocate): Set e3_has_nodescriptor to true only for non-named arrays. gcc/testsuite/ChangeLog: * gfortran.dg/allocate_with_source_26.f90: Adjust testcase. * gfortran.dg/allocate_with_mold_4.f90: New testcase.diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c index bdf7957..982e1e0 100644 --- a/gcc/fortran/trans-stmt.c +++ b/gcc/fortran/trans-stmt.c @@ -6660,16 +6660,13 @@ gfc_trans_allocate (gfc_code * code) else e3rhs = gfc_copy_expr (code->expr3); - // We need to propagate the bounds of the expr3 for source=/mold=; - // however, for nondescriptor arrays, we use internally a lower bound - // of zero instead of one, which needs to be corrected for the allocate obj - if (e3_is == E3_DESC) - { - symbol_attribute attr = gfc_expr_attr (code->expr3); - if (code->expr3->expr_type == EXPR_ARRAY || - (!attr.allocatable && !attr.pointer)) - e3_has_nodescriptor = true; - } + // We need to propagate the bounds of the expr3 for source=/mold=. + // However, for non-named arrays, the lbound has to be 1 and neither the + // bound used inside the called function even when returning an + // allocatable/pointer nor the zero used internally. + if (e3_is == E3_DESC + && code->expr3->expr_type != EXPR_VARIABLE) + e3_has_nodescriptor = true; } /* Loop over all objects to allocate. */ diff --git a/gcc/testsuite/gfortran.dg/allocate_with_mold_4.f90 b/gcc/testsuite/gfortran.dg/allocate_with_mold_4.f90 new file mode 100644 index 000..d545fe1 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/allocate_with_mold_4.f90 @@ -0,0 +1,24 @@ +program A_M + implicit none + real, parameter :: C(5:10) = 5.0 + real, dimension (:), allocatable :: A, B + allocate (A(6)) + call Init (A) +contains + subroutine Init ( A ) +real, dimension ( -1 : ), intent ( in ) :: A +integer, dimension ( 1 ) :: lb_B + +allocate (B, mold = A) +if (any (lbound (B) /= lbound (A))) stop 1 +if (any (ubound (B) /= ubound (A))) stop 2 +if (any (shape (B) /= shape (A))) stop 3 +if (size (B) /= size (A)) stop 4 +deallocate (B) +allocate (B, mold = C) +if (any (lbound (B) /= lbound (C))) stop 5 +if (any (ubound (B) /= ubound (C))) stop 6 +if (any (shape (B) /= shape (C))) stop 7 +if (size (B) /= size (C)) stop 8 +end +end diff --git a/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90 b/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90 index 28f24fc..323c8a3 100644 --- a/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90 +++ b/gcc/testsuite/gfortran.dg/allocate_with_source_26.f90 @@ -34,23 +34,23 @@ program p if (lbound(p1, 1) /= 3 .or. ubound(p1, 1) /= 4 & .or. lbound(p2, 1) /= 3 .or. ubound(p2, 1) /= 4 & .or. lbound(p3, 1) /= 1 .or. ubound(p3, 1) /= 2 & - .or. lbound(p4, 1) /= 7 .or. ubound(p4, 1) /= 8 & + .or. lbound(p4, 1) /= 1 .or. ubound(p4, 1) /= 2 & .or. p1(3)%i /= 43 .or. p1(4)%i /= 56 & .or. p2(3)%i /= 43 .or. p2(4)%i /= 56 & .or. p3(1)%i /= 43 .or. p3(2)%i /= 56 & - .or. p4(7)%i /= 11 .or. p4(8)%i /= 12) then + .or. p4(1)%i /= 11 .or. p4(2)%i /= 12) then call abort() endif !write(*,*) lbound(a,1), ubound(a,1) ! prints 1 3 !write(*,*) lbound(b,1), ubound(b,1) ! prints 1 3 - !write(*,*) lbound(c,1), ubound(c,1) ! prints 3 5 + !write(*,*) lbound(c,1), ubound(c,1) ! prints 1 3 !write(*,*) lbound(d,1), ubound(d,1) ! prints 1 5 !write(*,*) lbound(e,1), ubound(e,1) ! prints 1 6
Re: [PATCH, PR90030] Fortran OpenMP/OpenACC array mapping alignment fix
Ping. On 2021/11/4 4:23 PM, Chung-Lin Tang wrote: Hi Jakub, As Thomas reported and submitted a patch a while ago: https://gcc.gnu.org/pipermail/gcc-patches/2019-April/519932.html https://gcc.gnu.org/pipermail/gcc-patches/2019-May/522738.html There's an issue with the Fortran front-end when mapping arrays: when creating the data MEM_REF for the map clause, there's a convention of casting the referencing pointer to 'c_char *' by fold_convert (build_pointer_type (char_type_node), ptr). This causes the alignment passed to the libgomp runtime for array data hardwared to '1', and causes alignment errors on the offload target (not always showing up, but can trigger due to slight change of clause ordering) This patch is not exactly Thomas' patch from 2019, but does the same thing. The new libgomp tests are directly reused though. A lot of scan test adjustment is also included in this patch. Patch has been tested for no regressions for gfortran and libgomp, is this okay for trunk? Thanks, Chung-Lin Fortran: fix array alignment for OpenMP/OpenACC target mapping clauses [PR90030] The Fortran front-end is creating maps of array data with a type of pointer to char_type_node, which when eventually passed to libgomp during runtime, marks the passed array with an alignment of 1, which can cause mapping alignment errors on the offload target. This patch removes the related fold_convert(build_pointer_type (char_type_node)) calls in fortran/trans-openmp.c, and adds gcc_asserts to ensure pointer type. 2021-11-04 Chung-Lin Tang Thomas Schwinge PR fortran/90030 gcc/fortran/ChangeLog: * trans-openmp.c (gfc_omp_finish_clause): Remove fold_convert to pointer to char_type_node, add gcc_assert of POINTER_TYPE_P. (gfc_trans_omp_array_section): Likewise. (gfc_trans_omp_clauses): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/finalize-1.f: Adjust scan test. * gfortran.dg/gomp/affinity-clause-1.f90: Likewise. * gfortran.dg/gomp/affinity-clause-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-4.f90: Likewise. * gfortran.dg/gomp/defaultmap-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-6.f90: Likewise. * gfortran.dg/gomp/map-3.f90: Likewise. * gfortran.dg/gomp/pr78260-2.f90: Likewise. * gfortran.dg/gomp/pr78260-3.f90: Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/pr90030.f90: New test. * testsuite/libgomp.fortran/pr90030.f90: New test.
[PATCH, v2, OpenMP 5.0] Remove array section base-pointer mapping semantics, and other front-end adjustments (mainline trunk)
Hi Jakub, attached is a rebased version of this "OpenMP fixes/adjustments" patch. This version removes some of the (ort == C_ORT_OMP || ort == C_ORT_ACC) stuff that's not needed in handle_omp_array_sections_1 and [c_]finish_omp_clauses. Note that this is meant to be patched atop of the recent also posted C++ PR92120 v5 patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584602.html Again, tested without regressions (together with the PR92120 patch), awaiting review. Thanks, Chung-Lin (ChangeLog updated below) On 2021/5/25 9:36 PM, Chung-Lin Tang wrote: This patch largely implements three pieces of functionality: (1) Per discussion and clarification on the omp-lang mailing list, standards conforming behavior for mapping array sections should *NOT* also map the base-pointer, i.e for this code: struct S { int *ptr; ... }; struct S s; #pragma omp target enter data map(to: s.ptr[:100]) Currently we generate after gimplify: #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \ map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) which is deemed incorrect. After this patch, the gimplify results are now adjusted to: #pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) (the attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: #pragma omp target enter data map(to: s.ptr, s.ptr[:100]) This adjustment in behavior required a number of small adjustments here and there in gimplify, including to accomodate map sequences for C++ references. There is also a small Fortran front-end patch involved (hence CCing Tobias and fortran@). The new gimplify processing changed behavior in handling GOMP_MAP_ALWAYS_POINTER maps such that the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the Fortran FE was generating a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and the pre-patch behavior was removing this map anyways. I have a small change in trans-openmp.c:gfc_trans_omp_array_section to not generate the map in this case, and so far no bad test results. (2) The second part (though kind of related to the first above) are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. This behavior is also noted in the 5.0 spec, but not yet properly coded before. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, so despite in the long term the entire map clause syntax parsing is probably going to be revamped, we're still adding this in for now. These changes are enabled for both OpenACC and OpenMP. 2021-11-19 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.c (struct omp_dim): New struct type for use inside c_parser_omp_variable_list. (c_parser_omp_variable_list): Allow multiple levels of array and component accesses in array section base-pointer expression. (c_parser_omp_clause_to): Set 'allow_deref' to true in call to c_parser_omp_var_list_parens. (c_parser_omp_clause_from): Likewise. * c-typeck.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (c_finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/cp/ChangeLog: * parser.c (struct omp_dim): New struct type for use inside cp_parser_omp_var_list_no_open. (cp_parser_omp_var_list_no_open): Allow multiple levels of array and component accesses in array section base-pointer expression. (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to cp_parser_omp_var_list for to/from clauses. * semantics.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (handle_omp_array_sections): Adjust pointer map generation of references. (finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_array_section): Do not generate GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type. gcc/ChangeLog: * gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter, accomodate case where 'offset' return of get_inner_reference is non-NULL. (is_or_contains_p): Further robustify conditions. (omp_target_reorder_clauses): In alloc/to/from sortin
[PATCH, v5, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v5]
Hi Jakub, On 2021/6/24 9:15 PM, Jakub Jelinek wrote: On Fri, Jun 18, 2021 at 10:25:16PM +0800, Chung-Lin Tang wrote: Note, you'll need to rebase your patch, it clashes with r12-1768-g7619d33471c10fe3d149dcbb701d99ed3dd23528. Sorry for that. And sorry for patch review delay. --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -13104,6 +13104,12 @@ handle_omp_array_sections_1 (tree c, tree t, vec , return error_mark_node; } t = TREE_OPERAND (t, 0); + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) Map clauses never appear on declare simd, so (ort == C_ORT_ACC || ort == C_ORT_OMP) previously meant always and since the in_reduction change is incorrect (as C_ORT_OMP_TARGET is used for target construct but not for e.g. target data* or target update). + && TREE_CODE (t) == MEM_REF) Upon reviewing, it appears that most of these C_ORT_* tests are no longer needed, removed in new patch. So please just use if (TREE_CODE (t) == MEM_REF) or explain when it shouldn't trigger. @@ -14736,6 +14743,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) { while (TREE_CODE (t) == COMPONENT_REF) t = TREE_OPERAND (t, 0); + if (TREE_CODE (t) == MEM_REF) + { + t = TREE_OPERAND (t, 0); + STRIP_NOPS (t); + } This doesn't look correct. At least the parsing (and the spec AFAIK) doesn't ensure that if there is ->, it must come before all the dots. So, if one uses map (s->x.y) the above would work, but if map (s->x.y->z) or map (s.a->b->c->d->e) is used, it wouldn't. I'd expect a single while loop that looks through COMPONENT_REFs and MEM_REFs as they appear. Maybe the handle_omp_array_sections_1 MEM_REF case too? Or do you want to have it done incrementally, start with supporting only a single -> first before all the dots and later on add support for the rest? I think the 5.0 and especially 5.1 wording basically says that map clause operand is arbitrary lvalue expression that includes array section support too, so eventually we should just have somewhere in parsing scope a bool whether OpenMP array sections are allowed or not, add OMP_ARRAY_REF or similar tree code for those and after parsing the expression, ensure array sections appear only where they can appear and for a subset of the lvalue expressions where we have decl plus series of -> field or . field or [ index ] or [ array section stuff ] handle those specially. That arbitrary lvalue can certainly be done incrementally. map (foo(123)->a.b[3]->c.d[:7]) and the like. Indeed this kind of modification is sort of "as encountered", so there are probably many cases that are not completely handled yet; it's not just the front-end, but also changes in gimplify_scan_omp_clauses(). However, I had another patch that should've plowed a bit further on this: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html as well as those patch sets that Julian is working on. (our current plan is to have my sets go in first, and Julian's on top, to minimize clashing) if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP && OMP_CLAUSE_MAP_IMPLICIT (c) && (bitmap_bit_p (_head, DECL_UID (t)) @@ -14802,6 +14814,15 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) bias) to zero here, so it is not set erroneously to the pointer size later on in gimplify.c. */ OMP_CLAUSE_SIZE (c) = size_zero_node; + indir_component_ref_p = false; + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) Same comment about ort tests. + && TREE_CODE (t) == COMPONENT_REF + && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF) + { + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + indir_component_ref_p = true; + STRIP_NOPS (t); + } Again, this can handle only a single -> @@ -42330,16 +42328,10 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = tc; } } - tree stmt = make_node (OMP_TARGET); - TREE_TYPE (stmt) = void_type_node; - OMP_TARGET_CLAUSES (stmt) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET]; - c_omp_adjust_map_clauses (OMP_TARGET_CLAUSES (stmt), true); - OMP_TARGET_BODY (stmt) = body; - OMP_TARGET_COMBINED (stmt) = 1; - SET_EXPR_LOCATION (stmt, pragma_tok->location); - add_stmt (stmt); - pc = _TARGET_CLAUSES (stmt); - goto check_clauses; + c_omp_adjust_map_clauses (cclauses[C_OMP_CLAUSE_SPLIT_TARGET], true); + finish_omp_target (pragma_tok->
[PATCH, v2, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)
Hi Jakub, On 2021/6/24 11:55 PM, Jakub Jelinek wrote: On Fri, May 14, 2021 at 09:20:25PM +0800, Chung-Lin Tang wrote: diff --git a/gcc/gimplify.c b/gcc/gimplify.c index e790f08b23f..69c4a8e0a0a 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -10374,6 +10374,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) gcc_unreachable (); } OMP_CLAUSE_SET_MAP_KIND (clause, kind); + OMP_CLAUSE_MAP_IMPLICIT_P (clause) = 1; if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST) { As Thomas mentioned, there is now also OMP_CLAUSE_MAP_IMPLICIT that means something different: /* Nonzero on map clauses added implicitly for reduction clauses on combined or composite constructs. They shall be removed if there is an explicit map clause. */ Having OMP_CLAUSE_MAP_IMPLICIT and OMP_CLAUSE_MAP_IMPLICIT_P would be too confusing. So either we need to use just one flag for both purposes or have two different flags and find a better name for one of them. The former would be possible if no OMP_CLAUSE_MAP clauses added by the FEs are implicit - then you could clear OMP_CLAUSE_MAP_IMPLICIT in gimplify_scan_omp_clauses. I wonder if it is the case though, e.g. doesn't your "Improve OpenMP target support for C++ [PR92120 v4]" patch add a lot of such implicit map clauses (e.g. the this[:1] and various others)? I have changed the name to OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P, to signal that this bit is to be passed to the runtime. Right now its intended to be used by clauses created by the middle-end, but front-end uses like that for C++ could be clarified later. Also, gimplify_adjust_omp_clauses_1 sometimes doesn't add just one map clause, but several, shouldn't those be marked implicit too? And similarly it calls lang_hooks.decls.omp_finish_clause which can add even further map clauses implicitly, shouldn't those be implicit too (in that case copy the flag from the clause it is called on to the extra clauses it adds)? Also as Thomas mentioned, it should be restricted to non-OpenACC, it can check gimplify_omp_ctxp->region_type if it is OpenMP or OpenACC. Agreed, I've adjusted the patch to only to this implicit setting for OpenMP. This reduces a lot of the originally needed scan test adjustment for existing OpenACC testcases. @@ -10971,9 +10972,15 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p, list_p = _CLAUSE_CHAIN (c); } - /* Add in any implicit data sharing. */ + /* Add in any implicit data sharing. Implicit clauses are added at the start Two spaces after dot in comments. Done. + of the clause list, but after any non-map clauses. */ struct gimplify_adjust_omp_clauses_data data; - data.list_p = list_p; + tree *implicit_add_list_p = orig_list_p; + while (*implicit_add_list_p +&& OMP_CLAUSE_CODE (*implicit_add_list_p) != OMP_CLAUSE_MAP) +implicit_add_list_p = _CLAUSE_CHAIN (*implicit_add_list_p); Why are the implicit map clauses added first and not last? As I also explained in the first submission email, due to the processing order, if implicit classes are added last (and processed last), for example: #pragma omp target map(tofrom: var.ptr[:N]) map(tofrom: var[implicit]) { // access of var.ptr[] } The explicit var.ptr[:N] will not find anything to map, because the (implicit) map(var) has not been seen yet, and the assumed array section attachment behavior will fail. Only an order like: map(tofrom: var[implicit]) map(tofrom: var.ptr[:N]) will the usual assumed behavior show. And yes, this depends on the new behavior implemented by patch [1], which I still need you to review. e.g. for map(var.ptr[:N]), the proper behavior should *only* map the array section but NOT the base-pointer. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571195.html There is also the OpenMP 5.1 [352:17-22] case which basically says that the implicit mappings should be ignored if there are explicit ones on the same construct (though, do we really create implicit clauses in that case?). Implicit clauses do not appear to be created if there's an explicit clause already existing. +#define GOMP_MAP_IMPLICIT (GOMP_MAP_FLAG_SPECIAL_3 \ +| GOMP_MAP_FLAG_SPECIAL_4) +/* Mask for entire set of special map kind bits. */ +#define GOMP_MAP_FLAG_SPECIAL_BITS (GOMP_MAP_FLAG_SPECIAL_0 \ +| GOMP_MAP_FLAG_SPECIAL_1 \ +| GOMP_MAP_FLAG_SPECIAL_2 \ +| GOMP_MAP_FLAG_SPECIAL_3 \ +| GOMP_MAP_FLAG_SPECIAL_4) ... +#define GOMP_MAP_IMPLICIT_P(X) \ + (((X) & GOMP_MAP_FLAG_SPECIAL_BITS) == GOMP_MAP_IMPLICIT) I think here we need to decide with which GOMP_MAP* kinds the implicit bit will need to be combined wit
[PATCH, PR90030] Fortran OpenMP/OpenACC array mapping alignment fix
Hi Jakub, As Thomas reported and submitted a patch a while ago: https://gcc.gnu.org/pipermail/gcc-patches/2019-April/519932.html https://gcc.gnu.org/pipermail/gcc-patches/2019-May/522738.html There's an issue with the Fortran front-end when mapping arrays: when creating the data MEM_REF for the map clause, there's a convention of casting the referencing pointer to 'c_char *' by fold_convert (build_pointer_type (char_type_node), ptr). This causes the alignment passed to the libgomp runtime for array data hardwared to '1', and causes alignment errors on the offload target (not always showing up, but can trigger due to slight change of clause ordering) This patch is not exactly Thomas' patch from 2019, but does the same thing. The new libgomp tests are directly reused though. A lot of scan test adjustment is also included in this patch. Patch has been tested for no regressions for gfortran and libgomp, is this okay for trunk? Thanks, Chung-Lin Fortran: fix array alignment for OpenMP/OpenACC target mapping clauses [PR90030] The Fortran front-end is creating maps of array data with a type of pointer to char_type_node, which when eventually passed to libgomp during runtime, marks the passed array with an alignment of 1, which can cause mapping alignment errors on the offload target. This patch removes the related fold_convert(build_pointer_type (char_type_node)) calls in fortran/trans-openmp.c, and adds gcc_asserts to ensure pointer type. 2021-11-04 Chung-Lin Tang Thomas Schwinge PR fortran/90030 gcc/fortran/ChangeLog: * trans-openmp.c (gfc_omp_finish_clause): Remove fold_convert to pointer to char_type_node, add gcc_assert of POINTER_TYPE_P. (gfc_trans_omp_array_section): Likewise. (gfc_trans_omp_clauses): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/finalize-1.f: Adjust scan test. * gfortran.dg/gomp/affinity-clause-1.f90: Likewise. * gfortran.dg/gomp/affinity-clause-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-4.f90: Likewise. * gfortran.dg/gomp/defaultmap-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-6.f90: Likewise. * gfortran.dg/gomp/map-3.f90: Likewise. * gfortran.dg/gomp/pr78260-2.f90: Likewise. * gfortran.dg/gomp/pr78260-3.f90: Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/pr90030.f90: New test. * testsuite/libgomp.fortran/pr90030.f90: New test.diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index e81c558..0ff90b7 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -1564,7 +1564,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) if (present) ptr = gfc_build_cond_assign_expr (, present, ptr, null_pointer_node); - ptr = fold_convert (build_pointer_type (char_type_node), ptr); + gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr))); ptr = build_fold_indirect_ref (ptr); OMP_CLAUSE_DECL (c) = ptr; c2 = build_omp_clause (input_location, OMP_CLAUSE_MAP); @@ -2381,7 +2381,7 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_omp_namelist *n, OMP_CLAUSE_SIZE (node), elemsz); } gcc_assert (se.post.head == NULL_TREE); - ptr = fold_convert (build_pointer_type (char_type_node), ptr); + gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr))); OMP_CLAUSE_DECL (node) = build_fold_indirect_ref (ptr); ptr = fold_convert (ptrdiff_type_node, ptr); @@ -2849,8 +2849,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl))) { decl = gfc_conv_descriptor_data_get (decl); - decl = fold_convert (build_pointer_type (char_type_node), - decl); + gcc_assert (POINTER_TYPE_P (TREE_TYPE (decl))); decl = build_fold_indirect_ref (decl); } else if (DECL_P (decl)) @@ -2873,8 +2872,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, } gfc_add_block_to_block (_block, ); gfc_add_block_to_block (_block, ); - ptr = fold_convert (build_pointer_type (char_type_node), - ptr); + gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr))); OMP_CLAUSE_DECL (node) = build_fold_indirect_ref (ptr); } if (list == OMP_LIST_DEPEND) @@ -3117,8 +3115,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, if (present) ptr = gfc_build_cond_assign_expr (block, present, ptr, null_pointer_node); - ptr
Re: [PATCH, v2, OpenMP 5.2, Fortran] Strictly-structured block support for OpenMP directives
On 2021/10/21 12:15 AM, Jakub Jelinek wrote: +program main + integer :: x, i, n + + !$omp parallel + block +x = x + 1 + end block I'd prefer not to use those x = j or x = x + 1 etc. as statements that do random work here whenever possible. While those are dg-do compile testcases, especially if it is without dg-errors I think it is preferrable not to show bad coding examples. E.g. the x = x + 1 above is wrong for 2 reasons, x is uninitialized before the parallel, and there is a data race, the threads, teams etc. can write to x concurrently. I think better would be to use something like call do_work which doesn't have to be defined anywhere and will just stand there as a black box for unspecified work. + !$omp workshare + block +x = x + 1 + end block There are exceptions though, e.g. workshare is such a case, because e.g. call do_work is not valid in workshare. So, it is ok to keep using x = x + 1 here if you initialize it first at the start of the program. + !$omp workshare + block +x = 1 +!$omp critical +block + x = 3 +end block + end block And then there are cases like the above, please just use different variables there (all initialized) or say an array and access different elements in the different spots. Jakub Thanks, attached is what I finally committed. Chung-Lin From 2e4659199e814b7ee0f6bd925fd2c0a7610da856 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 21 Oct 2021 14:56:20 +0800 Subject: [PATCH] openmp: Fortran strictly-structured blocks support This implements strictly-structured blocks support for Fortran, as specified in OpenMP 5.2. This now allows using a Fortran BLOCK construct as the body of most OpenMP constructs, with a "!$omp end ..." ending directive optional for that form. gcc/fortran/ChangeLog: * decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case together with COMP_BLOCK. * parse.c (parse_omp_structured_block): Change return type to 'gfc_statement', add handling for strictly-structured block case, adjust recursive calls to parse_omp_structured_block. (parse_executable): Adjust calls to parse_omp_structured_block. * parse.h (enum gfc_compile_state): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK. * trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case handling. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/cancel-1.f90: Adjust testcase. * gfortran.dg/gomp/nesting-3.f90: Adjust testcase. * gfortran.dg/gomp/strictly-structured-block-1.f90: New test. * gfortran.dg/gomp/strictly-structured-block-2.f90: New test. * gfortran.dg/gomp/strictly-structured-block-3.f90: New test. libgomp/ChangeLog: * libgomp.texi (Support of strictly structured blocks in Fortran): Adjust to 'Y'. * testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase. --- gcc/fortran/decl.c| 1 + gcc/fortran/parse.c | 69 +- gcc/fortran/parse.h | 2 +- gcc/fortran/trans-openmp.c| 6 +- gcc/testsuite/gfortran.dg/gomp/cancel-1.f90 | 3 + gcc/testsuite/gfortran.dg/gomp/nesting-3.f90 | 20 +- .../gomp/strictly-structured-block-1.f90 | 214 ++ .../gomp/strictly-structured-block-2.f90 | 139 .../gomp/strictly-structured-block-3.f90 | 52 + libgomp/libgomp.texi | 2 +- .../libgomp.fortran/task-reduction-16.f90 | 1 + 11 files changed, 484 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-3.f90 diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 6784b07ae9e..6043e100fbb 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -8429,6 +8429,7 @@ gfc_match_end (gfc_statement *st) break; case COMP_BLOCK: +case COMP_OMP_STRICTLY_STRUCTURED_BLOCK: *st = ST_END_BLOCK; target = " block"; eos_ok = 0; diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 2a454be79b0..b1e73ee6801 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -5459,7 +5459,7 @@ parse_oacc_loop (gfc_statement acc_st) /* Parse the statements of an OpenMP structured block. */ -static void +static gfc_statement parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) { gfc_statement st, omp_end_st; @@ -5546,6 +5546,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) gcc_unreachable (); } + bool block_construct = false; + gfc_namespace *my_ns = NULL; + gfc_namespace *my_parent = NULL; + + st = next_statement (); + + if (st == ST_BLOCK) +
[PATCH, v2, OpenMP 5.2, Fortran] Strictly-structured block support for OpenMP directives
Hi Jakub, this version adjusts the patch to let sections/parallel sections also use strictly-structured blocks, making it more towards 5.2. Because of this change, some of the testcases using the sections-construct need a bit of adjustment too, since "block; end block" at the start of the construct now means something different than before. There are now three new testcases, with the non-dg-error/dg-error cases separated, and a third testcase containing a few cases listed in prior emails. I hope this is enough. The implementation status entry in libgomp/libgomp.texi for strictly-structured blocks has also been changed to "Y" in this patch. Tested without regressions, is this now okay for trunk? Thanks, Chung-Lin 2021-10-20 Chung-Lin Tang gcc/fortran/ChangeLog: * decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case together with COMP_BLOCK. * parse.c (parse_omp_structured_block): Change return type to 'gfc_statement', add handling for strictly-structured block case, adjust recursive calls to parse_omp_structured_block. (parse_executable): Adjust calls to parse_omp_structured_block. * parse.h (enum gfc_compile_state): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK. * trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case handling. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/cancel-1.f90: Adjust testcase. * gfortran.dg/gomp/nesting-3.f90: Adjust testcase. * gfortran.dg/gomp/strictly-structured-block-1.f90: New test. * gfortran.dg/gomp/strictly-structured-block-2.f90: New test. * gfortran.dg/gomp/strictly-structured-block-3.f90: New test. libgomp/ChangeLog: * libgomp.texi (Support of strictly structured blocks in Fortran): Adjust to 'Y'. * testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index d6a22d13451..66489da12be 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -8449,6 +8449,7 @@ gfc_match_end (gfc_statement *st) break; case COMP_BLOCK: +case COMP_OMP_STRICTLY_STRUCTURED_BLOCK: *st = ST_END_BLOCK; target = " block"; eos_ok = 0; diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 7d765a0866d..2fb98844356 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -5451,7 +5451,7 @@ parse_oacc_loop (gfc_statement acc_st) /* Parse the statements of an OpenMP structured block. */ -static void +static gfc_statement parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) { gfc_statement st, omp_end_st; @@ -5538,6 +5538,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) gcc_unreachable (); } + bool block_construct = false; + gfc_namespace *my_ns = NULL; + gfc_namespace *my_parent = NULL; + + st = next_statement (); + + if (st == ST_BLOCK) +{ + /* Adjust state to a strictly-structured block, now that we found that +the body starts with a BLOCK construct. */ + s.state = COMP_OMP_STRICTLY_STRUCTURED_BLOCK; + + block_construct = true; + gfc_notify_std (GFC_STD_F2008, "BLOCK construct at %C"); + + my_ns = gfc_build_block_ns (gfc_current_ns); + gfc_current_ns = my_ns; + my_parent = my_ns->parent; + + new_st.op = EXEC_BLOCK; + new_st.ext.block.ns = my_ns; + new_st.ext.block.assoc = NULL; + accept_statement (ST_BLOCK); + st = parse_spec (ST_NONE); +} + do { if (workshare_stmts_only) @@ -5554,7 +5580,6 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) restrictions apply recursively. */ bool cycle = true; - st = next_statement (); for (;;) { switch (st) @@ -5580,13 +5605,13 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) case ST_OMP_PARALLEL_MASKED: case ST_OMP_PARALLEL_MASTER: case ST_OMP_PARALLEL_SECTIONS: - parse_omp_structured_block (st, false); - break; + st = parse_omp_structured_block (st, false); + continue; case ST_OMP_PARALLEL_WORKSHARE: case ST_OMP_CRITICAL: - parse_omp_structured_block (st, true); - break; + st = parse_omp_structured_block (st, true); + continue; case ST_OMP_PARALLEL_DO: case ST_OMP_PARALLEL_DO_SIMD: @@ -5609,7 +5634,7 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) } } else - st = parse_executable (ST_NONE); + st = parse_executable (st); if (st == ST_NONE) unexpected_eof (); else if (st == ST_OMP_SECTION @@ -56
[PATCH, v2, OpenMP, Fortran] Support in_reduction for Fortran
dg-do run } + +subroutine foo (x, y) ... + if (x .ne. 11) stop 1 + if (y .ne. 21) stop 2 + +end program main Again, something that can be dealt incrementally, but the testsuite coverage of https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573600.html was larger than this. Would be nice e.g. to cover both scalar vars and array sections/arrays, parameters passed by reference as in the above testcase, but also something that isn't a reference (either a local variable or dummy parameter with VALUE, etc. Jakub I have expanded target-in-reduction-1.f90 to cover local variables and VALUE passed parameters. Array sections in reductions appear to be still not supported by the Fortran FE in general (Tobias plans to work on that later). I also added another target-in-reduction-2.f90 testcase that tests the "orphaned" case in Fortran, where the task/target-in_reduction is in another separate subroutine. Tested without regressions on trunk, is this okay to commit? Thanks, Chung-Lin 2021-10-19 Chung-Lin Tang gcc/fortran/ChangeLog: * openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case. (gfc_match_omp_clauses): Add 'openmp_target' default false parameter, adjust call to gfc_match_omp_clause_reduction. (match_omp): Adjust call to gfc_match_omp_clauses * trans-openmp.c (gfc_trans_omp_taskgroup): Add call to gfc_match_omp_clause, create and return block. gcc/ChangeLog: * omp-low.c (omp_copy_decl_2): For !ctx, use record_vars to add new copy as local variable. (scan_sharing_clauses): Place copy of OMP_CLAUSE_IN_REDUCTION decl in ctx->outer instead of ctx. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan pattern. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-in-reduction-1.f90: New test. * testsuite/libgomp.fortran/target-in-reduction-2.f90: New test.diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 6a4ca2868f8..210fb06dbec 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -1138,7 +1138,7 @@ failed: static match gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc, - bool allow_derived) + bool allow_derived, bool openmp_target = false) { if (pc == 'r' && gfc_match ("reduction ( ") != MATCH_YES) return MATCH_NO; @@ -1285,6 +1285,19 @@ gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc, n->u2.udr = gfc_get_omp_namelist_udr (); n->u2.udr->udr = udr; } + if (openmp_target && list_idx == OMP_LIST_IN_REDUCTION) + { + gfc_omp_namelist *p = gfc_get_omp_namelist (), **tl; + p->sym = n->sym; + p->where = p->where; + p->u.map_op = OMP_MAP_ALWAYS_TOFROM; + + tl = >lists[OMP_LIST_MAP]; + while (*tl) + tl = &((*tl)->next); + *tl = p; + p->next = NULL; + } } return MATCH_YES; } @@ -1353,7 +1366,7 @@ gfc_match_dupl_atomic (bool not_dupl, const char *name) static match gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, bool first = true, bool needs_space = true, - bool openacc = false) + bool openacc = false, bool openmp_target = false) { bool error = false; gfc_omp_clauses *c = gfc_get_omp_clauses (); @@ -2057,8 +2070,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, goto error; } if ((mask & OMP_CLAUSE_IN_REDUCTION) - && gfc_match_omp_clause_reduction (pc, c, openacc, -allow_derived) == MATCH_YES) + && gfc_match_omp_clause_reduction (pc, c, openacc, allow_derived, +openmp_target) == MATCH_YES) continue; if ((mask & OMP_CLAUSE_INBRANCH) && (m = gfc_match_dupl_check (!c->inbranch && !c->notinbranch, @@ -3512,7 +3525,8 @@ static match match_omp (gfc_exec_op op, const omp_mask mask) { gfc_omp_clauses *c; - if (gfc_match_omp_clauses (, mask) != MATCH_YES) + if (gfc_match_omp_clauses (, mask, true, true, false, +op == EXEC_OMP_TARGET) != MATCH_YES) return MATCH_ERROR; new_st.op = op; new_st.ext.omp_clauses = c; diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index d234d1b070f..56efe195257 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -6405,12 +6405,17 @@ gfc_trans_omp_task (gfc_code *code) static tree gfc_trans_omp_taskgroup (gfc
Re: [PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives
On 2021/10/14 7:19 PM, Jakub Jelinek wrote: On Thu, Oct 14, 2021 at 12:20:51PM +0200, Jakub Jelinek via Gcc-patches wrote: Thinking more about the Fortran case for !$omp sections, there is an ambiguity. !$omp sections block !$omp section end block is clear and !$omp end sections is optional, but !$omp sections block end block is ambiguous during parsing, it could be either followed by !$omp section and then the BLOCK would be first section, or by !$omp end sections and then it would be clearly the whole sections, with first section being empty inside of the block, or if it is followed by something else, it is ambiguous whether the block ... end block is part of the first section, followed by something and then we should be looking later for either !$omp section or !$omp end section to prove that, or if !$omp sections block end block was the whole sections construct and we shouldn't await anything further. I'm afraid back to the drawing board. And I have to correct myself, there is no ambiguity in 5.2 here, the important fact is hidden in sections/parallel sections being block-associated constructs. That means the body of the whole construct has to be a structured-block, and by the 5.1+ definition of Fortran structured block, it is either block ... end block or something that doesn't start with block. So, !$omp sections block end block a = 1 is only ambiguous in whether it is actually !$omp sections block !$omp section end block a = 1 or !$omp sections !$omp section block end block !$omp end sections a = 1 but both actually do the same thing, work roughly as !$omp single. If one wants block statement as first in structured-block-sequence of the first section, followed by either some further statements or by other sections, then one needs to write !$omp sections !$omp section block end block a = 1 ... !$omp end sections or !$omp sections block block end block a = 1 ... end block Your patch probably already handles it that way, but we again need testsuite coverage to prove it is handled the way it should in all these cases (and that we diagnose what is invalid). The patch currently does not allow strictly-structured BLOCK for sections/parallel sections, since I was referencing the 5.1 spec while writing it, although that is trivially fixable. (was sensing a bit odd why those two constructs had to be specially treated in 5.1 anyways) The bigger issue is that under the current way the patch is written, the statements inside a [parallel] sections construct are parsed automatically by parse_executable(), so to enforce the specified meaning of "structured-block-sequence" (i.e. BLOCK or non-BLOCK starting sequence of stmts) will probably be more a bit harder to implement: !$omp sections block !$omp section block x=0 end block x=1 !! This is allowed now, though should be wrong spec-wise !$omp section x=2 end block Currently "$!omp section" acts essentially as a top-level separator within a sections-construct, rather than a structured directive. Though I would kind of argue this is actually better to use for the user (why prohibit what looks like very apparent meaning of the program?) So Jakub, my question for this is, is this current state okay? Or must we implement the spec pedantically? As for the other issues: (1) BLOCK/END BLOCK is not generally handled in parse_omp_structured_block, so for workshare, it is only handled for the top-level construct, not within workshare. I think this is what you meant in the last mail. (2) As for the dangling-!$omp_end issue Tobias raised, because we are basically using 1-statement lookahead, any "!$omp end <*>" is naturally bound with the adjacent BLOCK/END BLOCK, so we should be okay there. Thanks, Chung-Lin
[PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives
Hi all, this patch add support for "strictly-structured blocks" introduced in OpenMP 5.1, basically allowing BLOCK constructs to serve as the body for directives: !$omp target block ... end block [!$omp end target] !! end directive is optional !$omp parallel block ... end block ... !$omp end parallel !! error, considered as not match to above parallel directive The parsing loop in parse_omp_structured_block() has been modified to allow a BLOCK construct after the first statement has been detected to be ST_BLOCK. This is done by a hard modification of the state into (the new) COMP_OMP_STRICTLY_STRUCTURED_BLOCK after the statement is known (I'm not sure if there's a way to 'peek' the next statement/token in the Fortran FE, open to suggestions on how to better write this) Tested with no regressions on trunk, is this okay to commit? Thanks, Chung-Lin 2021-10-07 Chung-Lin Tang gcc/fortran/ChangeLog: * decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case together with COMP_BLOCK. * parse.c (parse_omp_structured_block): Adjust declaration, add 'bool strictly_structured_block' default true parameter, add handling for strictly-structured block case, adjust recursive calls to parse_omp_structured_block. (parse_executable): Adjust calls to parse_omp_structured_block. * parse.h (enum gfc_compile_state): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK. * trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case handling. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/strictly-structured-block-1.f90: New test. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index b3c65b7175b..ff66d1f9475 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -8445,6 +8445,7 @@ gfc_match_end (gfc_statement *st) break; case COMP_BLOCK: +case COMP_OMP_STRICTLY_STRUCTURED_BLOCK: *st = ST_END_BLOCK; target = " block"; eos_ok = 0; diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 7d765a0866d..d78bf9b8fa5 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -5451,8 +5451,9 @@ parse_oacc_loop (gfc_statement acc_st) /* Parse the statements of an OpenMP structured block. */ -static void -parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) +static gfc_statement +parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only, + bool strictly_structured_block = true) { gfc_statement st, omp_end_st; gfc_code *cp, *np; @@ -5538,6 +5539,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) gcc_unreachable (); } + bool block_construct = false; + gfc_namespace* my_ns = NULL; + gfc_namespace* my_parent = NULL; + + st = next_statement (); + + if (strictly_structured_block && st == ST_BLOCK) +{ + /* Adjust state to a strictly-structured block, now that we found that +the body starts with a BLOCK construct. */ + s.state = COMP_OMP_STRICTLY_STRUCTURED_BLOCK; + + block_construct = true; + gfc_notify_std (GFC_STD_F2008, "BLOCK construct at %C"); + + my_ns = gfc_build_block_ns (gfc_current_ns); + gfc_current_ns = my_ns; + my_parent = my_ns->parent; + + new_st.op = EXEC_BLOCK; + new_st.ext.block.ns = my_ns; + new_st.ext.block.assoc = NULL; + accept_statement (ST_BLOCK); + st = parse_spec (ST_NONE); +} + do { if (workshare_stmts_only) @@ -5554,7 +5581,6 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) restrictions apply recursively. */ bool cycle = true; - st = next_statement (); for (;;) { switch (st) @@ -5576,17 +5602,20 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) parse_forall_block (); break; + case ST_OMP_PARALLEL_SECTIONS: + st = parse_omp_structured_block (st, false, false); + continue; + case ST_OMP_PARALLEL: case ST_OMP_PARALLEL_MASKED: case ST_OMP_PARALLEL_MASTER: - case ST_OMP_PARALLEL_SECTIONS: - parse_omp_structured_block (st, false); - break; + st = parse_omp_structured_block (st, false); + continue; case ST_OMP_PARALLEL_WORKSHARE: case ST_OMP_CRITICAL: - parse_omp_structured_block (st, true); - break; + st = parse_omp_structured_block (st, true); + continue; case ST_OMP_PARALLEL_DO: case ST_OMP_PARALLEL_DO_SIMD: @@ -5609,7 +5638,7 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
[PATCH, OpenMP, Fortran] Support in_reduction for Fortran
Hi Jakub, and Fortran folks, this patch does the required adjustments to let 'in_reduction' work for Fortran. Not just for the target directive actually, task directive is also working after this patch. There is a little bit of adjustment in omp-low.c:scan_sharing_clauses: RTL expand of the copy of the OMP_CLAUSE_IN_REDUCTION decl was failing for Fortran by-reference arguments, which seems to work after placing them under the outer ctx (when it exists). This also now needs checking the field_map for existence of the field before inserting. Tested without regressions on mainline trunk, is this okay? (testing for devel/omp/gcc-11 is in progress) Thanks, Chung-Lin 2021-09-17 Chung-Lin Tang gcc/fortran/ChangeLog: * openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case. (gfc_match_omp_clauses): Add 'openmp_target' default false parameter, adjust call to gfc_match_omp_clause_reduction. (match_omp): Adjust call to gfc_match_omp_clauses * trans-openmp.c (gfc_trans_omp_taskgroup): Add call to gfc_match_omp_clause, create and return block. gcc/ChangeLog: * omp-low.c (scan_sharing_clauses): Place in_reduction copy of variable in outer ctx if if exists. Check if non-existent in field_map before installing OMP_CLAUSE_IN_REDUCTION decl. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan pattern. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-in-reduction-1.f90: New test. diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index a64b7f5aa10..8179b5aa8bc 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -1138,7 +1138,7 @@ failed: static match gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc, - bool allow_derived) + bool allow_derived, bool openmp_target = false) { if (pc == 'r' && gfc_match ("reduction ( ") != MATCH_YES) return MATCH_NO; @@ -1285,6 +1285,19 @@ gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc, n->u2.udr = gfc_get_omp_namelist_udr (); n->u2.udr->udr = udr; } + if (openmp_target && list_idx == OMP_LIST_IN_REDUCTION) + { + gfc_omp_namelist *p = gfc_get_omp_namelist (), **tl; + p->sym = n->sym; + p->where = p->where; + p->u.map_op = OMP_MAP_ALWAYS_TOFROM; + + tl = >lists[OMP_LIST_MAP]; + while (*tl) + tl = &((*tl)->next); + *tl = p; + p->next = NULL; + } } return MATCH_YES; } @@ -1353,7 +1366,7 @@ gfc_match_dupl_atomic (bool not_dupl, const char *name) static match gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, bool first = true, bool needs_space = true, - bool openacc = false) + bool openacc = false, bool openmp_target = false) { bool error = false; gfc_omp_clauses *c = gfc_get_omp_clauses (); @@ -2057,8 +2070,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, goto error; } if ((mask & OMP_CLAUSE_IN_REDUCTION) - && gfc_match_omp_clause_reduction (pc, c, openacc, -allow_derived) == MATCH_YES) + && gfc_match_omp_clause_reduction (pc, c, openacc, allow_derived, +openmp_target) == MATCH_YES) continue; if ((mask & OMP_CLAUSE_INBRANCH) && (m = gfc_match_dupl_check (!c->inbranch && !c->notinbranch, @@ -3496,7 +3509,8 @@ static match match_omp (gfc_exec_op op, const omp_mask mask) { gfc_omp_clauses *c; - if (gfc_match_omp_clauses (, mask) != MATCH_YES) + if (gfc_match_omp_clauses (, mask, true, true, false, +(op == EXEC_OMP_TARGET)) != MATCH_YES) return MATCH_ERROR; new_st.op = op; new_st.ext.omp_clauses = c; diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index e55e0c81868..08483951066 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -6391,12 +6391,17 @@ gfc_trans_omp_task (gfc_code *code) static tree gfc_trans_omp_taskgroup (gfc_code *code) { + stmtblock_t block; + gfc_start_block (); tree body = gfc_trans_code (code->block->next); tree stmt = make_node (OMP_TASKGROUP); TREE_TYPE (stmt) = void_type_node; OMP_TASKGROUP_BODY (stmt) = body; - OMP_TASKGROUP_CLAUSES (stmt) = NULL_TREE; - return stmt; + OMP_TASKGROUP_CLAUSES (stmt) = gfc_trans_omp_clauses (, + code->ext.omp_cla
[PATCH, OG11, OpenACC, committed] Fix ICE for non-contiguous arrays
Currently we ICE when non-decl base-pointers (like struct members) are used in OpenACC non-contiguous array sections. This patch is kind of a band-aid to reject such cases ATM. We'll deal with the more elaborate middle-end stuff to fully support them later. Committed to devel/omp/gcc-11 after testing. This is not for mainline. Chung-Lin From 4e34710679ac084d7ca15ccf387c1b6f1e64c2d1 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 19 Aug 2021 16:17:02 +0800 Subject: [PATCH] openacc: fix ICE for non-decl expression in non-contiguous array base-pointer Currently, we do not support cases like struct-members as the base-pointer for an OpenACC non-contiguous array. Mark such cases as unsupported in the C/C++ front-ends, instead of ICEing on them. gcc/c/ChangeLog: * c-typeck.c (handle_omp_array_sections_1): Robustify non-contiguous array check and reject non-DECL base-pointer cases as unsupported. gcc/cp/ChangeLog: * semantics.c (handle_omp_array_sections_1): Robustify non-contiguous array check and reject non-DECL base-pointer cases as unsupported. --- gcc/c/c-typeck.c | 35 +++ gcc/cp/semantics.c | 39 --- 2 files changed, 47 insertions(+), 27 deletions(-) diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 9c4822bbf27..a8b54c676c0 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -13431,25 +13431,36 @@ handle_omp_array_sections_1 (tree c, tree t, vec , && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_AFFINITY && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST) { - if (ort == C_ORT_ACC) - /* Note that OpenACC does accept these kinds of non-contiguous - pointer based arrays. */ - non_contiguous = true; - else + /* If any prior dimension has a non-one length, then deem this +array section as non-contiguous. */ + for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST; + d = TREE_CHAIN (d)) { - /* If any prior dimension has a non-one length, then deem this -array section as non-contiguous. */ - for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST; - d = TREE_CHAIN (d)) + tree d_length = TREE_VALUE (d); + if (d_length == NULL_TREE || !integer_onep (d_length)) { - tree d_length = TREE_VALUE (d); - if (d_length == NULL_TREE || !integer_onep (d_length)) + if (ort == C_ORT_ACC) { + while (TREE_CODE (d) == TREE_LIST) + d = TREE_CHAIN (d); + if (DECL_P (d)) + { + /* Note that OpenACC does accept these kinds of +non-contiguous pointer based arrays. */ + non_contiguous = true; + break; + } error_at (OMP_CLAUSE_LOCATION (c), - "array section is not contiguous in %qs clause", + "base-pointer expression in %qs clause not " + "supported for non-contiguous arrays", omp_clause_code_name[OMP_CLAUSE_CODE (c)]); return error_mark_node; } + + error_at (OMP_CLAUSE_LOCATION (c), + "array section is not contiguous in %qs clause", + omp_clause_code_name[OMP_CLAUSE_CODE (c)]); + return error_mark_node; } } } diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index e56ad8aa1e1..ad62ad76ff9 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -5292,32 +5292,41 @@ handle_omp_array_sections_1 (tree c, tree t, vec , return error_mark_node; } /* If there is a pointer type anywhere but in the very first -array-section-subscript, the array section could be non-contiguous. -Note that OpenACC does accept these kinds of non-contiguous pointer -based arrays. */ +array-section-subscript, the array section could be non-contiguous. */ if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_AFFINITY && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST) { - if (ort == C_ORT_ACC) - /* Note that OpenACC does accept these kinds of non-contiguous - pointer based arrays. */ - non_contiguous = true; - else + /* If any prior dimension has a non-one length, then deem this +array section as non-contiguous. */ + for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST; + d
[PATCH, libgomp, OpenMP 5.0, OG11, committed] Implement omp_get_device_num
The omp_get_device_num patch was merged to devel/omp/gcc-11 (OG11) after testing. Commit was 83177ca9f262b230c892e667ebf685f96a718ec8. This commit also effective reverts the one-liner patch by Cesar: https://gcc.gnu.org/pipermail/gcc-patches/2017-October/484844.html (which was still kept in OG11 at 59ef9fea377db72f198b2bd5a95d5aef58b3f9c4) That small patch is not on mainline, and conflicts with the current merge, and upon review and test, appears isn't really needed anymore. Thus took the liberty to overwrite it with the merge of this omp_get_device_num patch. Chung-Lin From 83177ca9f262b230c892e667ebf685f96a718ec8 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Mon, 9 Aug 2021 08:58:07 +0200 Subject: [PATCH] openmp: Implement omp_get_device_num routine This patch implements the omp_get_device_num library routine, specified in OpenMP 5.0. GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number" variable, is defined on the device-side libgomp, has it's address returned to host-side libgomp during device initialization, and the host libgomp then sets its value to the designated device number. libgomp/ChangeLog: * icv-device.c (omp_get_device_num): New API function, host side. * fortran.c (omp_get_device_num_): New interface function. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. * libgomp.map (OMP_5.0.2): New version space with omp_get_device_num, omp_get_device_num_. * libgomp.texi (omp_get_device_num): Add documentation for new API function. * omp.h.in (omp_get_device_num): Add declaration. * omp_lib.f90.in (omp_get_device_num): Likewise. * omp_lib.h.in (omp_get_device_num): Likewise. * target.c (gomp_load_image_to_device): If additional entry for device number exists at end of returned entries from 'load_image_func' hook, copy the assigned device number over to the device variable. * config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * plugin/plugin-gcn.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * plugin/plugin-nvptx.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * testsuite/lib/libgomp.exp (check_effective_target_offload_target_intelmic): New function for testing for intelmic offloading. * testsuite/libgomp.c-c++-common/target-45.c: New test. * testsuite/libgomp.fortran/target10.f90: New test. (cherry picked from commit 0bac793ed6bad2c0c13cd1e93a1aa5808467afc8) --- libgomp/ChangeLog.omp | 42 +++--- libgomp/config/gcn/icv-device.c| 11 ++ libgomp/config/nvptx/icv-device.c | 11 ++ libgomp/fortran.c | 7 libgomp/icv-device.c | 9 + libgomp/libgomp-plugin.h | 6 libgomp/libgomp.map| 8 - libgomp/libgomp.texi | 29 +++ libgomp/omp.h.in | 1 + libgomp/omp_lib.f90.in | 6 libgomp/omp_lib.h.in | 3 ++ libgomp/plugin/plugin-gcn.c| 38 ++-- libgomp/plugin/plugin-nvptx.c | 25 +++-- libgomp/target.c | 36 ++- libgomp/testsuite/lib/libgomp.exp | 5 +++ libgomp/testsuite/libgomp.c-c++-common/target-45.c | 30 libgomp/testsuite/libgomp.fortran/target10.f90 | 20 +++ 17 files changed, 276 insertions(+), 11 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-45.c create mode 100644 libgomp/testsuite/libgomp.fortran/target10.f90 diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index 9467e90..3a3299b 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,15 +1,49 @@ -2021-06-30 Tobias Burnus +2021-08-09 Tobias Burnus Backported from master: - 2021-06-29 Thomas Schwinge + 2021-08-05 Chung-Lin Tang + + * icv-device.c (omp_get_device_num): New API function, host side. + * fortran.c (omp_get_device_num_): New interface function. + * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. + * libgomp.map (OMP_5.0.2): New version space with omp_get_device_num, + omp_get_devic
[PATCH, v3, libgomp, OpenMP 5.0, committed] Implement omp_get_device_num
a/libgomp/config/gcn/icv-device.c +++ b/libgomp/config/gcn/icv-device.c @@ -70,6 +70,16 @@ omp_is_initial_device (void) return 0; } +/* This is set to the device number of current GPU during device initialization, + when the offload image containing this libgomp portion is loaded. */ +static int GOMP_DEVICE_NUM_VAR; + +int +omp_get_device_num (void) +{ + return GOMP_DEVICE_NUM_VAR; +} + ialias (omp_set_default_device) ialias (omp_get_default_device) ialias (omp_get_initial_device) I suppose also add 'ialias (omp_get_device_num)' here, like... Done, thanks for catching. --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp +# Return 1 if compiling for offload target intelmic +proc check_effective_target_offload_target_intelmic { } { +return [libgomp_check_effective_target_offload_target "*-intelmic"] +} --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/target-45.c @@ -0,0 +1,30 @@ +/* { dg-do run { target { ! offload_target_intelmic } } } */ This means that the test case is skipped as soon as the compiler is configured for Intel MIC offloading -- even if that's not used during execution. From some older experiment of mine, I do have a 'check_effective_target_offload_device_intel_mic', which I'll propose as a follow-up, once this is in. Great. + if (initial_device .and. host_device_num .ne. device_num) stop 2 That one matches 'libgomp.c-c++-common/target-45.c': if (initial_device && host_device_num != device_num) abort (); ..., but here: + if (initial_device .and. host_device_num .eq. device_num) stop 3 ... shouldn't that be '.not.initial_device', like in: if (!initial_device && host_device_num == device_num) abort (); Yeah, Tobias also caught this as well :) (Also, I'm not familiar with Fortran operator precedence rules, so probably would put the individual expressions into braces.;-) -- But I trust you know better than I do, of course.) Done. Attached is the final "v3" patch that I committed. Thanks, Chung-Lin From 0bac793ed6bad2c0c13cd1e93a1aa5808467afc8 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 5 Aug 2021 23:29:03 +0800 Subject: [PATCH] openmp: Implement omp_get_device_num routine This patch implements the omp_get_device_num library routine, specified in OpenMP 5.0. GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number" variable, is defined on the device-side libgomp, has it's address returned to host-side libgomp during device initialization, and the host libgomp then sets its value to the designated device number. libgomp/ChangeLog: * icv-device.c (omp_get_device_num): New API function, host side. * fortran.c (omp_get_device_num_): New interface function. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. * libgomp.map (OMP_5.0.2): New version space with omp_get_device_num, omp_get_device_num_. * libgomp.texi (omp_get_device_num): Add documentation for new API function. * omp.h.in (omp_get_device_num): Add declaration. * omp_lib.f90.in (omp_get_device_num): Likewise. * omp_lib.h.in (omp_get_device_num): Likewise. * target.c (gomp_load_image_to_device): If additional entry for device number exists at end of returned entries from 'load_image_func' hook, copy the assigned device number over to the device variable. * config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * plugin/plugin-gcn.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * plugin/plugin-nvptx.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * testsuite/lib/libgomp.exp (check_effective_target_offload_target_intelmic): New function for testing for intelmic offloading. * testsuite/libgomp.c-c++-common/target-45.c: New test. * testsuite/libgomp.fortran/target10.f90: New test. --- libgomp/config/gcn/icv-device.c | 11 ++ libgomp/config/nvptx/icv-device.c | 11 ++ libgomp/fortran.c | 7 libgomp/icv-device.c | 9 + libgomp/libgomp-plugin.h | 6 +++ libgomp/libgomp.map | 8 +++- libgomp/libgomp.texi | 29 ++ libgomp/omp.h.in | 1 + libgomp/omp_lib.f90.in| 6 +++
Re: [PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num
On 2021/8/3 8:22 PM, Thomas Schwinge wrote: Hi Chung-Lin! On 2021-08-02T21:10:57+0800, Chung-Lin Tang wrote: --- a/libgomp/fortran.c +++ b/libgomp/fortran.c +int32_t +omp_get_device_num_ (void) +{ + return omp_get_device_num (); +} Missing 'ialias_redirect (omp_get_device_num)'? Grüße Thomas Thanks, will fix before committing. Chung-Lin
[PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num
On 2021/7/23 6:39 PM, Jakub Jelinek wrote: On Fri, Jul 23, 2021 at 06:21:41PM +0800, Chung-Lin Tang wrote: --- a/libgomp/icv-device.c +++ b/libgomp/icv-device.c @@ -61,8 +61,17 @@ omp_is_initial_device (void) return 1; } +int +omp_get_device_num (void) +{ + /* By specification, this is equivalent to omp_get_initial_device + on the host. */ + return omp_get_initial_device (); +} + I think this won't work properly with the intel micoffload, where the host libgomp is used in the offloaded code. For omp_is_initial_device, the plugin solves it by: liboffloadmic/plugin/offload_target_main.cpp overriding it: /* Override the corresponding functions from libgomp. */ extern "C" int omp_is_initial_device (void) __GOMP_NOTHROW { return 0; } extern "C" int32_t omp_is_initial_device_ (void) { return omp_is_initial_device (); } but guess it will need slightly more work because we need to copy the value to the offloading device too. It can be done incrementally though. I guess this part of intelmic functionality will just have to wait later. There seem to be other parts of liboffloadmic that seems to need re-work, e.g. omp_get_num_devices() return mic_engines_total, where it should actually return the number of all devices (not just intelmic). omp_get_initial_device() returning -1 (which I don't quite understand), etc. Really suggest to have intelmic support be re-worked as an offload plugin inside libgomp, rather than floating outside by itself. --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -102,6 +102,12 @@ struct addr_pair uintptr_t end; }; +/* This symbol is to name a target side variable that holds the designated + 'device number' of the target device. The symbol needs to be available to + libgomp code and the offload plugin (which in the latter case must be + stringified). */ +#define GOMP_DEVICE_NUM_VAR __gomp_device_num For a single var it is acceptable (though, please avoid the double space before offload plugin in the comment), but once we have more than one variable, I think we should simply have a struct which will contain all the parameters that need to be copied from the host to the offloading device at image load time (and have eventually another struct that holds parameters that we'll need to copy to the device on each kernel launch, I bet some ICVs will be one category, other ICVs another one). Actually, if you look at the 5.[01] specifications, omp_get_device_num() is not defined in terms of an ICV. Maybe it conceptually ought to be, but the current description of "the device number of the device on which the calling thread is executing" is not one if the defined ICVs. It looks like there will eventually be some kind of ICV block handled in a similar way, but I think that the modifications will be straightforward then. For now, I think it's okay for GOMP_DEVICE_NUM_VAR to just be a normal global variable. diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index 8ea27b5565f..ffcb98ae99e 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -197,6 +197,8 @@ OMP_5.0.1 { omp_get_supported_active_levels_; omp_fulfill_event; omp_fulfill_event_; + omp_get_device_num; + omp_get_device_num_; } OMP_5.0; This is wrong. We've already released GCC 11.1 with the OMP_5.0.1 symbol version, so we must not add any further symbols into that symbol version. OpenMP 5.0 routines added in GCC 12 should be OMP_5.0.2 symbol version. I've adjusted this into 5.0.2, in between 5.0.1 and the new 5.1 added by the recent omp_display_env[_] routines. omp_get_device_num is a OpenMP 5.0 introduced API function, so I think this is the correct handling (instead of stashing into 5.1). There is a new function check_effective_target_offload_target_intelmic() in testsuite/lib/libgomp.exp, used to test for non-intelmic offloading situations. Re-tested with no regressions, seeking approval for trunk. Thanks, Chung-Lin 2021-08-02 Chung-Lin Tang libgomp/ChangeLog * icv-device.c (omp_get_device_num): New API function, host side. * fortran.c (omp_get_device_num_): New interface function. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. * libgomp.map (OMP_5.0.2): New version space with omp_get_device_num, omp_get_device_num_. * libgomp.texi (omp_get_device_num): Add documentation for new API function. * omp.h.in (omp_get_device_num): Add declaration. * omp_lib.f90.in (omp_get_device_num): Likewise. * omp_lib.h.in (omp_get_device_num): Likewise. * target.c (gomp_load_image_to_device): If additional entry for device number exists at end of returned entries from 'load_image_func' hook, copy the assigned device number over to the device variable. * config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num):
Re: [PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num
On 2021/7/23 7:01 PM, Tobias Burnus wrote: I personally prefer having: int initial_dev; and inside 'omp target' (with 'map(from:initial_dev)'): initial_device = omp_is_initial_device(); Then the check would be: if (initial_device && host_device_num != device_num) abort(); if (!initial_device && host_device_num == device_num) abort(); (Likewise for Fortran.) Thanks, I've adjusted the new testcases to use this style. And instead of restricting the target to nvptx/gcn, we could just add dg-xfail-run-if for *-intelmic-* and *-intelmicemul-*. I've added a 'offload_target_intelmic' to use on the new testcases. Additionally, offload_target_nvptx/...amdgcn only check whether compilation support is available not whether a device exists at run time. (The device availability is checked by target_offload_device, using omp_is_initial_device().) I guess there is value in testing compilation as long as the compiler is properly configured, and leaving the execution as an independent test. OTOH, I think the OpenMP execution tests are not properly forcing offload (or not) using the environment variables, unlike what we have for OpenACC. Thanks, Chung-Lin
[PATCH, libgomp, OpenMP 5.0] Implement omp_get_device_num
Hi all, this patch implements the omp_get_device_num API function, which appears to be a missing piece in the library routines implementation. The host-side implementation is simple, which by specification is equivalent to omp_get_initial_device. Inside offloaded regions, the preferred way to should be that the device already has this information initialized (once) when the device is initialized. And the function merely returns the stored value. This implementation adds a convention for an additional entry (dubbed under 'others' in the code) returned by the 'load_image' plugin hook. Basically we define a variable name in libgomp-plugin.h, which the device libgomp defines, and the offload plugin searches for, and returns the variable device location start/end for gomp_load_image_from_device to initialize. The device-side omp_get_device_num then just returns that value. This patch implements for gcn and nvptx offload targets. The icv-device.c file is starting to look like a file ready to consolidate away the target specific versions, but that's for later. Basic libgomp tests were added for C/C++ and Fortran. Tested without regressions with offloading for amdgcn and nvptx on x86_64-linux host. Okay for trunk? Thanks, Chung-Lin 2021-07-23 Chung-Lin Tang libgomp/ChangeLog * icv-device.c (omp_get_device_num): New API function, host side. * fortran.c (omp_get_device_num_): New interface function. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol. * libgomp.map (OMP_5.0.1): Add omp_get_device_num, omp_get_device_num_. * libgomp.texi (omp_get_device_num): Add documentation for new API function. * omp.h.in (omp_get_device_num): Add declaration. * omp_lib.f90.in (omp_get_device_num): Likewise. * omp_lib.h.in (omp_get_device_num): Likewise. * target.c (gomp_load_image_to_device): If additional entry for device number exists at end of returned entries from 'load_image_func' hook, copy the assigned device number over to the device variable. * config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * config/plugin/plugin-gcn.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global. (omp_get_device_num): New API function, device side. * config/plugin/plugin-nvptx.c ("symcat.h"): Add include. (GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR at end of returned 'target_table' entries. * testsuite/libgomp.c-c++-common/target-45.c: New test. * testsuite/libgomp.fortran/target10.f90: New test. diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c index 72d4f7cff74..8f72028a6c8 100644 --- a/libgomp/config/gcn/icv-device.c +++ b/libgomp/config/gcn/icv-device.c @@ -70,6 +70,16 @@ omp_is_initial_device (void) return 0; } +/* This is set to the device number of current GPU during device initialization, + when the offload image containing this libgomp portion is loaded. */ +static int GOMP_DEVICE_NUM_VAR; + +int +omp_get_device_num (void) +{ + return GOMP_DEVICE_NUM_VAR; +} + ialias (omp_set_default_device) ialias (omp_get_default_device) ialias (omp_get_initial_device) diff --git a/libgomp/config/nvptx/icv-device.c b/libgomp/config/nvptx/icv-device.c index 3b96890f338..e586da1d3a8 100644 --- a/libgomp/config/nvptx/icv-device.c +++ b/libgomp/config/nvptx/icv-device.c @@ -58,8 +58,19 @@ omp_is_initial_device (void) return 0; } +/* This is set to the device number of current GPU during device initialization, + when the offload image containing this libgomp portion is loaded. */ +static int GOMP_DEVICE_NUM_VAR; + +int +omp_get_device_num (void) +{ + return GOMP_DEVICE_NUM_VAR; +} + ialias (omp_set_default_device) ialias (omp_get_default_device) ialias (omp_get_initial_device) ialias (omp_get_num_devices) ialias (omp_is_initial_device) +ialias (omp_get_device_num) diff --git a/libgomp/fortran.c b/libgomp/fortran.c index 4ec39c4e61b..2360582e32e 100644 --- a/libgomp/fortran.c +++ b/libgomp/fortran.c @@ -598,6 +598,12 @@ omp_get_initial_device_ (void) return omp_get_initial_device (); } +int32_t +omp_get_device_num_ (void) +{ + return omp_get_device_num (); +} + int32_t omp_get_max_task_priority_ (void) { diff --git a/libgomp/icv-device.c b/libgomp/icv-device.c index c1bedf46647..f11bdfa85c4 100644 --- a/libgomp/icv-device.c +++ b/libgomp/icv-device.c @@ -61,8 +61,17 @@ omp_is_initial_device (void) return 1; } +int +omp_get_device_num (void) +{ + /* By specification, this is equivalent to omp_get_initial_device + on the host. */ + return omp_get_initial_dev
[PATCH, libgomp, PR101114, committed] Fix struct-elem-5.c regression
The libgomp.c-c++-common/struct-elem-5.c test which I added for the Structure element mapping patch, does not properly "fail" for non-shared (unified) address space cases (like host-fallback). This was handled inside the testcase for struct-elem-[14].c, but missed this one due to the dg-shouldfail nature. Fixed by adding "target offload_device_nonshared_as" to dg-run. This is quite small and obvious, so directly committed after testing. Chung-Lin libgomp/ChangeLog: PR testsuite/101114 * testsuite/libgomp.c-c++-common/struct-elem-5.c: Add "target offload_device_nonshared_as" condition for enabling test. From e0672017370b9a9362fda52ecffe33d1c9c41829 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Sat, 26 Jun 2021 00:42:58 +0800 Subject: [PATCH] testsuite/101114: Adjust libgomp.c-c++-common/struct-elem-5.c testcase The dg-shouldfail testcase libgomp.c-c++-common/struct-elem-5.c does not properly fail for non-shared address space offloading. Adjust testcase to limit testing only for "target offload_device_nonshared_as". libgomp/ChangeLog: PR testsuite/101114 * testsuite/libgomp.c-c++-common/struct-elem-5.c: Add "target offload_device_nonshared_as" condition for enabling test. --- libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c index 814c30120e5..31a2fa5e8cf 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c +++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c @@ -1,4 +1,4 @@ -/* { dg-do run } */ +/* { dg-do run { target offload_device_nonshared_as } } */ struct S { -- 2.17.1
[PATCH, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v4]
Hi Jakub, this patch is the "v4" version of my PR92120 patch, v3 was here: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570886.html (there I listed the various patches from devel/omp/gcc-10 branch that was combined, which I won't repeat here). Basically this v4 adds fixes for lambda capture, which was already pushed to devel/omp/gcc-11 yesterday: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572988.html I have attached both the combined v4 version, and the v3-to-v4 diff. Tested on x86_64-linux with nvptx offloading, seeking for approval to trunk. Thanks, Chung-Lin gcc/cp/ * cp-tree.h (finish_omp_target): New declaration. (finish_omp_target_clauses): Likewise. * parser.c (cp_parser_omp_clause_map): Adjust call to cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true. (cp_parser_omp_target): Factor out code, adjust into calls to new function finish_omp_target. * pt.c (tsubst_expr): Add call to finish_omp_target_clauses for OMP_TARGET case. * semantics.c (handle_omp_array_sections_1): Add handling to create 'this->member' from 'member' FIELD_DECL. (handle_omp_array_sections): Likewise. (finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP map clauses. Handle 'A->member' case in map clauses. (struct omp_target_walk_data): New struct for walking over target-directive tree body. (finish_omp_target_clauses_r): New function for tree walk. (finish_omp_target_clauses): New function. (finish_omp_target): New function. gcc/c/ * c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in call to c_parser_omp_variable_list to 'true'. * c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in array base handling. (c_finish_omp_clauses): Handle 'A->member' case in map clauses. gcc/ * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map *. Adjust struct map handling to handle cases of *A and A->B expressions. Under !DECL_P case of GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when handling component_ref_p case. Add unshare_expr and gimplification when created GOMP_MAP_STRUCT is not a DECL. Add code to add firstprivate pointer for *pointer-to-struct case. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. * omp-low.c (lower_omp_target): Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. * tree-pretty-print.c (dump_omp_clause): Likewise. gcc/testsuite/ * gcc.dg/gomp/target-3.c: New testcase. * g++.dg/gomp/target-3.C: New testcase. * g++.dg/gomp/target-lambda-1.C: New testcase. * g++.dg/gomp/target-lambda-2.C: New testcase. * g++.dg/gomp/target-this-1.C: New testcase. * g++.dg/gomp/target-this-2.C: New testcase. * g++.dg/gomp/target-this-3.C: New testcase. * g++.dg/gomp/target-this-4.C: New testcase. * g++.dg/gomp/target-this-5.C: New testcase. * g++.dg/gomp/this-2.C: Adjust testcase. include/ * gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. (GOMP_MAP_POINTER_P): Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION. libgomp/ * libgomp.h (gomp_attach_pointer): Add bool parameter. * oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer. (goacc_enter_data_internal): Likewise. * target.c (gomp_map_vars_existing): Update assert condition to include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION. (gomp_map_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for mapping a pointer with NULL target. (gomp_attach_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for attaching a pointer with NULL target. (gomp_map_vars_internal): Update calls to gomp_map_pointer and gomp_attach_pointer, add handling for GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases. * testsuite/libgomp.c++/target-23.C: New testcase. * testsuite/libgomp.c++/target-lambda-1.C: New testcase. * testsuite/libgomp.c++/target-lambda-2.C: New testcase. * testsuite/libgomp.c++/target-this-1.C: New testcase. * testsuite/libgomp.c++/target-this-2.C: New testcase. * testsuite/libgomp.c++/target-this-3.C: New
[PATCH, C++, OpenMP 5.0, OG11] Fixes for lambda in offload regions
This patch contains: (1) Some fixes for lambda capture by-reference to work inside offload regions. (2) Cases where lambda objects declared inside an offload region were mistakenly target-mapped on the enclosing target construct, causing a gimplify ICE (because it isn't binded at that position), added checks to avoid this. Added another testcase to test if lambda works in these cases. Tested without regressions on devel/omp/gcc-11, pushed there. Jakub, this technically is a further bug fix for the PR92120 v3 patch. I'll submit a v4 for mainline trunk later, or this patch independently in case the v3 patch is already reviewed by then. Thanks, Chung-Lin gcc/cp/ChangeLog: * semantics.c (struct omp_target_walk_data): Add 'hash_set local_decls' member. (finish_omp_target_clauses_r): Handle BIND_EXPR case, fill in local_decls there. Adjust case to not add locally declared lambda objects to data->lambda_objects_accessed. (finish_omp_target_clauses): Peel away TARGET_EXPR for lambda objects. Adjust map kind to _TOFROM for reference fields in closures. gcc/testsuite/ChangeLog: * g++.dg/gomp/target-lambda-2.C: New test. libgomp/ChangeLog: * testsuite/libgomp.c++/target-lambda-2.C: New test. From dbf5d72f4c077215330e5b06fbb9b3311b807c2a Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 17 Jun 2021 21:53:10 +0800 Subject: [PATCH] Fixes for lambda in offload regions This patch contains: (1) Some fixes for lambda capture by-reference to work inside offload regions. (2) Cases where lambda objects declared inside an offload region were mistakenly target-mapped on the enclosing target construct, causing a gimplify ICE (because it isn't binded at that position), added checks to avoid this. gcc/cp/ChangeLog: * semantics.c (struct omp_target_walk_data): Add 'hash_set local_decls' member. (finish_omp_target_clauses_r): Handle BIND_EXPR case, fill in local_decls there. Adjust case to not add locally declared lambda objects to data->lambda_objects_accessed. (finish_omp_target_clauses): Peel away TARGET_EXPR for lambda objects. Adjust map kind to _TOFROM for reference fields in closures. gcc/testsuite/ChangeLog: * g++.dg/gomp/target-lambda-2.C: New test. libgomp/ChangeLog: * testsuite/libgomp.c++/target-lambda-2.C: New test. --- gcc/cp/semantics.c| 22 ++-- gcc/testsuite/g++.dg/gomp/target-lambda-2.C | 35 +++ .../testsuite/libgomp.c++/target-lambda-2.C | 30 3 files changed, 85 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/gomp/target-lambda-2.C create mode 100644 libgomp/testsuite/libgomp.c++/target-lambda-2.C diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 25fa6cb5305..1f7eacfe701 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -9145,6 +9145,8 @@ struct omp_target_walk_data tree current_closure; hash_set closure_vars_accessed; + + hash_set local_decls; }; static tree @@ -9203,12 +9205,25 @@ finish_omp_target_clauses_r (tree *tp, int *walk_subtrees, void *ptr) return NULL_TREE; } + if (TREE_CODE (t) == BIND_EXPR) +{ + tree block = BIND_EXPR_BLOCK (t); + for (tree var = BLOCK_VARS (block); var; var = DECL_CHAIN (var)) + if (!data->local_decls.contains (var)) + data->local_decls.add (var); + return NULL_TREE; +} + if (TREE_TYPE(t) && LAMBDA_TYPE_P (TREE_TYPE (t))) { tree lt = TREE_TYPE (t); gcc_assert (CLASS_TYPE_P (lt)); - if (!data->lambda_objects_accessed.contains (t)) + if (!data->lambda_objects_accessed.contains (t) + /* Do not prepare to create target maps for locally declared +lambdas or anonymous ones. */ + && !data->local_decls.contains (t) + && TREE_CODE (t) != TARGET_EXPR) data->lambda_objects_accessed.add (t); *walk_subtrees = 0; return NULL_TREE; @@ -9494,6 +9509,9 @@ finish_omp_target_clauses (location_t loc, tree body, tree *clauses_ptr) i != data.lambda_objects_accessed.end (); ++i) { tree lobj = *i; + if (TREE_CODE (lobj) == TARGET_EXPR) + lobj = TREE_OPERAND (lobj, 0); + tree lt = TREE_TYPE (lobj); gcc_assert (LAMBDA_TYPE_P (lt) && CLASS_TYPE_P (lt)); @@ -9530,7 +9548,7 @@ finish_omp_target_clauses (location_t loc, tree body, tree *clauses_ptr) tree exp = build3 (COMPONENT_REF, TREE_TYPE (fld), lobj, fld, NULL_TREE); tree c = build_omp_clause (loc, OMP_CLAUSE_MAP); - OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TO); + OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM); OMP_CLAUSE_DECL (c)
Re: [PATCH, v3, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0
On 2021/6/10 9:04 PM, Jakub Jelinek wrote: I know you had performance concerns in the last round, compared with your sorting approach. I'll try to research on that later. Getting the v3 patch posted before backporting to devel/omp/gcc-11. But please have a look at this incrementally. I think the common case is just a couple of mappings (say < 10 or < 20 in 90%+ of cases) and a htab might be too expensive for that. Thanks, I'll do that later. + if (!omp_target_is_present (, d)) +abort (); + if (!omp_target_is_present ([0], d)) +abort (); + if (!omp_target_is_present ([0], d)) +abort (); + + #pragma omp target exit data map (from:q[:1]) + + if (omp_target_is_present (, d)) +abort (); Has this been tested with offloading not configured? omp_target_is_present will return 1 for the initial device for all the pointers (everything is present). So I wonder if these 3 if (omp_target_is_present (..., d)) shouldn't be if (d != id && omp_target_is_present (..., d)) Yeah, you're right. Host fallback mode aborts. I've modified the testcases as you suggested. Attached is the final patch I pushed. Thanks, Chung-Lin From 275c736e732d29934e4d22e8f030d5aae8c12a52 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 17 Jun 2021 21:33:32 +0800 Subject: [PATCH] libgomp: Structure element mapping for OpenMP 5.0 This patch implement OpenMP 5.0 requirements of incrementing/decrementing the reference count of a mapped structure at most once (across all elements) on a construct. This is implemented by pulling in libgomp/hashtab.h and using htab_t as a pointer set. Structure element list siblings also have pointers-to-refcounts linked together, to naturally achieve uniform increment/decrement without repeating. There are still some questions on whether using such a htab_t based set is faster/slower than using a sorted pointer array based implementation. This is to be researched on later. libgomp/ChangeLog: * hashtab.h (htab_clear): New function with initialization code factored out from... (htab_create): ...here, adjust to use htab_clear function. * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of special refcount values, add comments. (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL. (REFCOUNT_LINK): Likewise. (REFCOUNT_STRUCTELEM): New special refcount range for structure element siblings. (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element sibling maps. (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling. (REFCOUNT_STRUCTELEM_FLAG_LAST): Flag to indicate last sibling. (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag. (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag. (struct splay_tree_key_s): Add structelem_refcount and structelem_refcount_ptr fields into a union with dynamic_refcount. Add comments. (gomp_map_vars): Delete declaration. (gomp_map_vars_async): Likewise. (gomp_unmap_vars): Likewise. (gomp_unmap_vars_async): Likewise. (goacc_map_vars): New declaration. (goacc_unmap_vars): Likewise. * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars. (goacc_enter_datum): Likewise. (goacc_enter_data_internal): Likewise. * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars and goacc_unmap_vars. (GOACC_data_start): Adjust to use goacc_map_vars. (GOACC_data_end): Adjust to use goacc_unmap_vars. * target.c (hash_entry_type): New typedef. (htab_alloc): New function hook for hashtab.h. (htab_free): Likewise. (htab_hash): Likewise. (htab_eq): Likewise. (hashtab.h): Add file include. (gomp_increment_refcount): New function. (gomp_decrement_refcount): Likewise. (gomp_map_vars_existing): Add refcount_set parameter, adjust to use gomp_increment_refcount. (gomp_map_fields_existing): Add refcount_set parameter, adjust calls to gomp_map_vars_existing. (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p variable to guard OpenMP specific paths, adjust calls to gomp_map_vars_existing, add structure element sibling splay_tree_key sequence creation code, adjust Fortran map case to avoid increment under OpenMP. (gomp_map_vars): Adjust to static, add refcount_set parameter, manage local refcount_set if caller passed in NULL, adjust call to gomp_map_vars_internal. (gomp_map_vars_async): Adjust and rename into... (goacc_map_vars): ...this new function, adjust call to gomp_map_vars_internal. (gomp_remove_splay_tree_key): New function with code factored out from gomp_remove_var_internal. (gomp_remove_var_internal): Add code to h
[PATCH, v3, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0
Hi Jakub, this is a v3 version of my OpenMP 5.0 structure element mapping patch, v2 was here: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561139.html This v3 adds a small bug fix, where the initialization of the refcount didn't handle all cases, fixed by using gomp_refcount_increment here (more consistent). I know you had performance concerns in the last round, compared with your sorting approach. I'll try to research on that later. Getting the v3 patch posted before backporting to devel/omp/gcc-11. Thanks, Chung-Lin libgomp/ * hashtab.h (htab_clear): New function with initialization code factored out from... (htab_create): ...here, adjust to use htab_clear function. * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of special refcount values, add comments. (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL. (REFCOUNT_LINK): Likewise. (REFCOUNT_STRUCTELEM): New special refcount range for structure element siblings. (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element sibling maps. (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling. (REFCOUNT_STRUCTELEM_FLAG_LAST): Flag to indicate last sibling. (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag. (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag. (struct splay_tree_key_s): Add structelem_refcount and structelem_refcount_ptr fields into a union with dynamic_refcount. Add comments. (gomp_map_vars): Delete declaration. (gomp_map_vars_async): Likewise. (gomp_unmap_vars): Likewise. (gomp_unmap_vars_async): Likewise. (goacc_map_vars): New declaration. (goacc_unmap_vars): Likewise. * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars. (goacc_enter_datum): Likewise. (goacc_enter_data_internal): Likewise. * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars and goacc_unmap_vars. (GOACC_data_start): Adjust to use goacc_map_vars. (GOACC_data_end): Adjust to use goacc_unmap_vars. * target.c (hash_entry_type): New typedef. (htab_alloc): New function hook for hashtab.h. (htab_free): Likewise. (htab_hash): Likewise. (htab_eq): Likewise. (hashtab.h): Add file include. (gomp_increment_refcount): New function. (gomp_decrement_refcount): Likewise. (gomp_map_vars_existing): Add refcount_set parameter, adjust to use gomp_increment_refcount. (gomp_map_fields_existing): Add refcount_set parameter, adjust calls to gomp_map_vars_existing. (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p variable to guard OpenMP specific paths, adjust calls to gomp_map_vars_existing, add structure element sibling splay_tree_key sequence creation code, adjust Fortran map case to avoid increment under OpenMP. (gomp_map_vars): Adjust to static, add refcount_set parameter, manage local refcount_set if caller passed in NULL, adjust call to gomp_map_vars_internal. (gomp_map_vars_async): Adjust and rename into... (goacc_map_vars): ...this new function, adjust call to gomp_map_vars_internal. (gomp_remove_splay_tree_key): New function with code factored out from gomp_remove_var_internal. (gomp_remove_var_internal): Add code to handle removing multiple splay_tree_key sequence for structure elements, adjust code to use gomp_remove_splay_tree_key for splay-tree key removal. (gomp_unmap_vars_internal): Add refcount_set parameter, adjust to use gomp_decrement_refcount. (gomp_unmap_vars): Adjust to static, add refcount_set parameter, manage local refcount_set if caller passed in NULL, adjust call to gomp_unmap_vars_internal. (gomp_unmap_vars_async): Adjust and rename into... (goacc_unmap_vars): ...this new function, adjust call to gomp_unmap_vars_internal. (GOMP_target): Manage refcount_set and adjust calls to gomp_map_vars and gomp_unmap_vars. (GOMP_target_ext): Likewise. (gomp_target_data_fallback): Adjust call to gomp_map_vars. (GOMP_target_data): Likewise. (GOMP_target_data_ext): Likewise. (GOMP_target_end_data): Adjust call to gomp_unmap_vars. (gomp_exit_data): Add refcount_set parameter, adjust to use gomp_decrement_refcount, adjust to queue splay-tree keys for removal after main loop. (GOMP_target_enter_exit_data): Manage refcount_set and adjust calls to gomp_map_vars and gomp_exit_data. (gomp_target_task_fn): Likewise. * testsuite/libgomp.c-c++-common/refcount-1.c: New testcase. *
[PATCH, OpenMP 5.0] Remove array section base-pointer mapping semantics, and other front-end adjustments (mainline trunk)
Hi Jakub, this is a version of this patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html for mainline trunk. This patch largely implements three pieces of functionality: (1) Per discussion and clarification on the omp-lang mailing list, standards conforming behavior for mapping array sections should *NOT* also map the base-pointer, i.e for this code: struct S { int *ptr; ... }; struct S s; #pragma omp target enter data map(to: s.ptr[:100]) Currently we generate after gimplify: #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \ map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) which is deemed incorrect. After this patch, the gimplify results are now adjusted to: #pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) (the attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: #pragma omp target enter data map(to: s.ptr, s.ptr[:100]) This adjustment in behavior required a number of small adjustments here and there in gimplify, including to accomodate map sequences for C++ references. There is also a small Fortran front-end patch involved (hence CCing Tobias and fortran@). The new gimplify processing changed behavior in handling GOMP_MAP_ALWAYS_POINTER maps such that the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the Fortran FE was generating a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and the pre-patch behavior was removing this map anyways. I have a small change in trans-openmp.c:gfc_trans_omp_array_section to not generate the map in this case, and so far no bad test results. (2) The second part (though kind of related to the first above) are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. This behavior is also noted in the 5.0 spec, but not yet properly coded before. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, so despite in the long term the entire map clause syntax parsing is probably going to be revamped, we're still adding this in for now. These changes are enabled for both OpenACC and OpenMP. Tested on x86_64-linux with nvptx offloading with no regressions. This patch was merged and tested atop of the prior submitted patches: (a) https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570886.html "[PATCH, OpenMP 5.0] Improve OpenMP target support for C++ (includes PR92120 v3)" (b) https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570365.html "[PATCH, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)" so you might queued this one later than those for review. Thanks, Chung-Lin 2021-05-25 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.c (struct omp_dim): New struct type for use inside c_parser_omp_variable_list. (c_parser_omp_variable_list): Allow multiple levels of array and component accesses in array section base-pointer expression. (c_parser_omp_clause_to): Set 'allow_deref' to true in call to c_parser_omp_var_list_parens. (c_parser_omp_clause_from): Likewise. * c-typeck.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (c_finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/cp/ChangeLog: * parser.c (struct omp_dim): New struct type for use inside cp_parser_omp_var_list_no_open. (cp_parser_omp_var_list_no_open): Allow multiple levels of array and component accesses in array section base-pointer expression. (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to cp_parser_omp_var_list for to/from clauses. * semantics.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (handle_omp_array_sections): Adjust pointer map generation of references. (finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_array_section): Do not generate GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type. gcc/ChangeLog: * gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter, accomodate case where 'offset' return of get_inner_reference is non-NULL. (is_or_conta
[PATCH, OpenMP 5.0] Improve OpenMP target support for C++ (includes PR92120 v3)
Hi Jakub, the attached patch is a combination of the below patches already pushed to devel/omp/gcc-10, some are kind of transient bug fixes, but listing all for completeness: aadfc984: [PATCH] Target mapping C++ members inside member functions https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562467.html 36a1ebdb: [PATCH] OpenMP 5.0: map this[:1] in C++ non-static member functions (PR 92120) https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558975.html bf8605f1: [PATCH] Enable gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF (INDIRECT_REF ...)) map clauses. https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564976.html da047f63: [PATCH] Fix regression of array members in OpenMP map clauses. https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566086.html 4e714eaa: [PATCH] Fix template case of non-static member access inside member functions https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566592.html 2ed80263: [PATCH] Lambda capturing of pointers and references in target directives https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566935.html 08caada8: Arrow operator handling for C front-end in OpenMP map clauses https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566419.html To summarize, this patch set is an improvement for OpenMP target support for C++, including for inside non-static members, lambda objects, and struct member deref access expressions. The corresponding modifications for the C front-end are also included. This patch supercedes the prior versions of my PR92120 patch (implicit C++ map(this[:1])), so dubbing this "v3" of patch for that PR. Prior versions of the PR92120 patch was implemented by recording uses of 'this' in the parser, and then use the recorded uses during "finish" to create the implicit maps. When working on supporting lambda objects, this required using a tree-walk style processing of the OMP_TARGET body, so in only made sense to merge the entire 'this' processing together with it, so a large part of the parser changes were dropped, with the main processing in semantics.c now. Other parser changes to support '->' in map clauses are also with this patch. Tested without regressions on x86_64-linux with nvptx offloading, okay for trunk? Thanks, Chung-Lin 2021-05-20 Chung-Lin Tang gcc/cp/ * cp-tree.h (finish_omp_target): New declaration. (finish_omp_target_clauses): Likewise. * parser.c (cp_parser_omp_clause_map): Adjust call to cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true. (cp_parser_omp_target): Factor out code, adjust into calls to new function finish_omp_target. * pt.c (tsubst_expr): Add call to finish_omp_target_clauses for OMP_TARGET case. * semantics.c (handle_omp_array_sections_1): Add handling to create 'this->member' from 'member' FIELD_DECL. (handle_omp_array_sections): Likewise. (finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP map clauses. Handle 'A->member' case in map clauses. (struct omp_target_walk_data): New struct for walking over target-directive tree body. (finish_omp_target_clauses_r): New function for tree walk. (finish_omp_target_clauses): New function. (finish_omp_target): New function. gcc/c/ * c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in call to c_parser_omp_variable_list to 'true'. * c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in array base handling. (c_finish_omp_clauses): Handle 'A->member' case in map clauses. gcc/ * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map *. Adjust struct map handling to handle cases of *A and A->B expressions. Under !DECL_P case of GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when handling component_ref_p case. Add unshare_expr and gimplification when created GOMP_MAP_STRUCT is not a DECL. Add code to add firstprivate pointer for *pointer-to-struct case. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. * omp-low.c (lower_omp_target): Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. * tree-pretty-print.c (dump_omp_clause): Likewise. gcc/testsuite/ * gcc.dg/gomp/target-3.c: New testcase. * g++.dg/gomp/target-3.C: New testcase. * g++.dg/gomp/target-lambda-1.C: New testcase. * g++.dg/gomp/target-this-1.C: New testcase. * g++.dg/gomp/target-this-2.C: New testcase. * g++.dg/go
Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes
On 2021/5/17 10:26 PM, Julian Brown wrote: OK, understood. But, I'm a bit concerned that we're ignoring some "hidden rules" with regards to OMP pointer clause ordering/grouping that certain code (at least the bit that creates GOMP_MAP_STRUCT node groups, and parts of omp-low.c) relies on. I believe those rules are as follows: - an array slice is mapped using two or three pointers -- two for a normal (non-reference) base pointer, and three if we have a reference to a pointer (i.e. in C++) or an array descriptor (i.e. in Fortran). So we can have e.g. GOMP_MAP_TO GOMP_MAP_ALWAYS_POINTER GOMP_MAP_TO GOMP_MAP_.*_POINTER GOMP_MAP_ALWAYS_POINTER GOMP_MAP_TO GOMP_MAP_TO_PSET GOMP_MAP_ALWAYS_POINTER - for OpenACC, we extend this to allow (up to and including gimplify.c) the GOMP_MAP_ATTACH_DETACH mapping. So we can have (for component refs): GOMP_MAP_TO GOMP_MAP_ATTACH_DETACH GOMP_MAP_TO GOMP_MAP_TO_PSET GOMP_MAP_ATTACH_DETACH GOMP_MAP_TO GOMP_MAP_.*_POINTER GOMP_MAP_ATTACH_DETACH For the scanning in insert_struct_comp_map (as it is at present) to work right, these groups must stay intact. I think the current behaviour of omp_target_reorder_clauses on the og10 branch can break those groups apart though! Originally this sorting was intended to enforce OpenMP 5.0 map ordering rules, although I did add some ATTACH_DETACH ordering code in the latest round of patching. May not be the best practice. (The "prev_list_p" stuff in the loop in question in gimplify.c just keeps track of the first node in these groups.) Such a brittle way of doing this; even the variable name is not that obvious in what it intends to do. For OpenACC, the GOMP_MAP_ATTACH_DETACH code does*not* depend on the previous clause when lowering in omp-low.c. But GOMP_MAP_ALWAYS_POINTER does! And in one case ("update" directive), GOMP_MAP_ATTACH_DETACH is rewritten to GOMP_MAP_ALWAYS_POINTER, so for that case at least, the dependency on the preceding mapping node must stay intact. Yes, I think there are some weird conventions here, stemming from the front-ends. I would think that _ALWAYS_POINTER should exist at a similar level like _ATTACH_DETACH, both a pointer operation, just different details in runtime behavior, though its intended purpose for C++ references seem to skew some things here and there. OpenACC also allows "bare" GOMP_MAP_ATTACH and GOMP_MAP_DETACH nodes (corresponding to the "attach" and "detach" clauses). Those are handled a bit differently to GOMP_MAP_ATTACH_DETACH in gimplify.c -- but GOMP_MAP_ATTACH_Z_L_A_S doesn't quite behave like that either, I don't think? IIRC, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION was handled that way (just a single line in gimplify.c) due to idiosyncrasies with the surrounding generated maps from the C++ front-end (which ATM is the only user of this map-kind). So yeah, inside the compiler, its not entirely the same as GOMP_MAP_ATTACH, but it is intended to live through for the runtime to see. Anyway: I've not entirely understood what omp_target_reorder_clauses is doing, but I think it may need to try harder to keep the groups mentioned above together. What do you think? As you know, attach operations don't really need to be glued to the prior operations, it just has to be ordered after mapping of the pointer and the pointed. There's already some book-keeping to move clauses together, but as you say, it might need more. Overall, I think this re-organizing of the struct-group creation is a good thing, but actually as you probably also observed, this insistence of "in-flight" tree chain manipulation is just hard to work with and modify. Maybe instead of directly working on clause expression chains at this point, we should be stashing all this information into a single clause tree node, e.g. starting from the front-end, we can set 'OMP_CLAUSE_MAP_POINTER_KIND(c) = ALWAYS/ATTACH_DETACH/FIRSTPRIVATE/etc.', (instead of actually creating new, must-follow-in-order maps that's causing all these conventions). For struct-groups, during the start of gimplify_scan_omp_clauses(), we could work with map clause tree nodes with OMP_CLAUSE_MAP_STRUCT_LIST(c), which contains the entire TREE_LIST or VEC of elements. Then later, after scanning is complete, expand the list into the current form. Ordering is only created at this stage. Just an idea, not sure if it will help understandability in general, but it should definitely help to simplify when we're reordering due to other rules. Chung-Lin
Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes
On 2021/5/11 4:57 PM, Julian Brown wrote: This work-in-progress patch tries to get GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION to behave more like GOMP_MAP_ATTACH_DETACH -- in that the mapping is made to form groups to be processed by build_struct_group/build_struct_comp_map. I think that's important to integrate with how groups of mappings for array sections are handled in other cases. This patch isn't sufficient by itself to fix a couple of broken test cases at present (libgomp.c++/target-lambda-1.C, libgomp.c++/target-this-4.C), though. No, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION is supposed to be just a slightly different behavior version of GOMP_MAP_ATTACH; it tolerates an unmapped pointer-target and assigns NULL on the device, instead of just gomp_fatal(). (see its handling in libgomp/target.c) In case OpenACC can have the same such zero-length array section behavior, we can just share one GOMP_MAP_ATTACH map. For now it is treated as separate cases. Chung-Lin 2021-05-11 Julian Brown gcc/ * gimplify.c (build_struct_comp_nodes): Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION handling. (build_struct_group): Process GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION as part of pointer group. (gimplify_scan_omp_clauses): Update prev_list_p such that GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION will form part of pointer group. --- gcc/gimplify.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 6d204908c82..c5cb486aa23 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8298,7 +8298,9 @@ build_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end, if (grp_mid && OMP_CLAUSE_CODE (grp_mid) == OMP_CLAUSE_MAP && (OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ALWAYS_POINTER - || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH)) + || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH + || (OMP_CLAUSE_MAP_KIND (grp_mid) + == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION))) { tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP); @@ -8774,12 +8776,14 @@ build_struct_group (struct gimplify_omp_ctx *ctx, ? splay_tree_lookup (ctx->variables, (splay_tree_key) decl) : NULL); bool ptr = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_POINTER); - bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH); + bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH + || (OMP_CLAUSE_MAP_KIND (c) + == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)); bool attach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH); bool has_attachments = false; /* For OpenACC, pointers in structs should trigger an attach action. */ - if (attach_detach + if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH && ((region_type & (ORT_ACC | ORT_TARGET | ORT_TARGET_DATA)) || code == OMP_TARGET_ENTER_DATA || code == OMP_TARGET_EXIT_DATA)) @@ -9784,6 +9788,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, if (!remove && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_POINTER && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ATTACH_DETACH + && (OMP_CLAUSE_MAP_KIND (c) + != GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION) && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET && OMP_CLAUSE_CHAIN (c) && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (c)) == OMP_CLAUSE_MAP @@ -9792,7 +9798,9 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c)) == GOMP_MAP_ATTACH_DETACH) || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c)) - == GOMP_MAP_TO_PSET))) + == GOMP_MAP_TO_PSET) + || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c)) + == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION))) prev_list_p = list_p; break;
Re: [PATCH 5/5] Mapping of components of references to pointers to structs for OpenMP/OpenACC
Hi Julian, On 2021/5/15 5:27 AM, Julian Brown wrote: GCC currently raises a parse error for indirect accesses to struct members, where the base of the access is a reference to a pointer. This patch fixes that case. gcc/cp/ * semantics.c (finish_omp_clauses): Handle components of references to pointers to structs. libgomp/ * testsuite/libgomp.oacc-c++/deep-copy-17.C: Update test. --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -7670,7 +7670,12 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) if ((ort == C_ORT_ACC || ort == C_ORT_OMP) && TREE_CODE (t) == COMPONENT_REF && TREE_CODE (TREE_OPERAND (t, 0)) == INDIRECT_REF) - t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + { + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + /* References to pointers have a double indirection here. */ + if (TREE_CODE (t) == INDIRECT_REF) + t = TREE_OPERAND (t, 0); + } if (TREE_CODE (t) == COMPONENT_REF && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP || ort == C_ORT_ACC) There is already a large plethora of such modifications in this patch: "[PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments." https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html I am in the process of taking that patch to mainline, so are you sure this is not already handled there? diff --git a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C index dacbb520f3d..e038e9e3802 100644 --- a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C +++ b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C @@ -83,7 +83,7 @@ void strrp (void) a[0] = 8; c[0] = 10; e[0] = 12; - #pragma acc parallel copy(n->a[0:10], n->c[0:10], n->e[0:10]) + #pragma acc parallel copy(n->a[0:10], n->b, n->c[0:10], n->d, n->e[0:10]) { n->a[0] = n->c[0] + n->e[0]; } This testcase can be added. Chung-Lin
[PATCH, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)
Hi Jakub, This is a version of patch https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569665.html for mainline trunk. This patch implements relaxing the requirements when a map with the implicit attribute encounters an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22): "If a single contiguous part of the original storage of a list item with an implicit data-mapping attribute has corresponding storage in the device data environment prior to a task encountering the construct that is associated with the map clause, only that part of the original storage will have corresponding storage in the device data environment as a result of the map clause." Also tracked in the OpenMP spec context as issue #1463: https://github.com/OpenMP/spec/issues/1463 The implementation inside the compiler is to of course, tag the implicitly created maps with some indication of "implicit". I've done this with a OMP_CLAUSE_MAP_IMPLICIT_P macro, using 'base.deprecated_flag' underneath. There is an encoding of this as GOMP_MAP_IMPLICIT == GOMP_MAP_FLAG_SPECIAL_3|GOMP_MAP_FLAG_SPECIAL_4 in include/gomp-constants.h for the runtime, but I've intentionally avoided exploding the entire gimplify/omp-low with a new set of GOMP_MAP_IMPLICIT_TO/FROM/etc. symbols, instead adding in the new flag bits only at the final runtime call generation during omp-lowering. The rest is libgomp mapping taking care of the implicit case: allowing map success if an existing map is a proper subset of the new map, if the new map is implicit. Straightforward enough I think. There are also some additions to print the implicit attribute during tree pretty-printing, for that reason some scan tests were updated. Also, another adjustment in this patch is how implicitly created clauses are added to the current clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the new clauses to the end, this patch adds them at the position "after initial non-map clauses, but right before any existing map clauses". The reason for this is: when combined with other map clauses, for example: #pragma omp target map(rec.ptr[:N]) for (int i = 0; i < N; i++) rec.ptr[i] += 1; There will be an implicit map created for map(rec), because of the access inside the target region. The expectation is that 'rec' is implicitly mapped, and then the pointed array-section part by 'rec.ptr' will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' (in that order). If the implicit 'map(rec)' is appended to the end, instead of placed before other maps, the attachment operation will not find anything to attach to, and the entire region will fail. Note: this touches a bit on another issue which I will be sending a patch for later: per the discussion on omp-lang, an array section list item should *not* be mapping its base-pointer (although an attachment attempt should exist), while in current GCC behavior, for struct member pointers like 'rec.ptr' above, we do map it (which should be deemed incorrect). This means that as of right now, this modification of map order doesn't really exhibit the above mentioned behavior yet. I have included it as part of this patch because the "[implicit]" tree printing requires modifying many gimple scan tests already, so including the test modifications together seems more manageable patch-wise. Tested with no regressions on x86_64-linux with nvptx offloading. Was already pushed to devel/omp/gcc-10 a while ago, asking for approval for mainline trunk. Chung-Lin 2021-05-14 Chung-Lin Tang include/ChangeLog: * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro. (GOMP_MAP_IMPLICIT): New special map kind bits value. (GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of special map kind bits. (GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds. gcc/ChangeLog: * tree.h (OMP_CLAUSE_MAP_IMPLICIT_P): New access macro for 'implicit' bit, using 'base.deprecated_flag' field of tree_node. * tree-pretty-print.c (dump_omp_clause): Add support for printing implicit attribute in tree dumping. * gimplify.c (gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_IMPLICIT_P to 1 if map clause is implicitly created. (gimplify_adjust_omp_clauses): Adjust place of adding implicitly created clauses, from simple append, to starting of list, after non-map clauses. * omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind values passed to libgomp for implicit maps. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-implicit-map-1.c: New test. * c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern. * c-c++-common/goacc/firstprivate-mappings-1.c: Likewise. * c-c++-common/goac
Re: [PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments.
On 2021/5/11 11:15 , Thomas Schwinge wrote: Hi Chung-Lin! On 2021-05-11T19:28:04+0800, Chung-Lin Tang wrote: This patch largely implements three pieces of functionality: (1) Per discussion and clarification on the omp-lang mailing list, standards conforming behavior for mapping array sections should *NOT* also map the base-pointer, i.e for this code: struct S { int *ptr; ... }; struct S s; #pragma omp target enter data map(to: s.ptr[:100]) Currently we generate after gimplify: #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \ map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) which is deemed incorrect. After this patch, the gimplify results are now adjusted to: #pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) (the attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: #pragma omp target enter data map(to: s.ptr, s.ptr[:100]) This adjustment in behavior required a number of small adjustments here and there in gimplify, including to accomodate map sequences for C++ references. I'm a bit confused by that -- this mandates the bulk of the testsuite changes that you've included, and these seem a step backwards in terms of user experience, but then, I have no state on the exact OpenMP specification requirements, so you certainly may be right on that. (And also, as Julian mentioned, how this relates to OpenACC semantics, which I also haven't considered in detail -- but I note you didn't adjust any OpenACC testcases for that, so I suppose that's really conditionalized to OpenMP only.) It is indeed a bit awkward to use, but that's what the omp-lang list seemed to decide. This change is OpenMP only. I took care to only handle OpenMP constructs like this in the middle-end, of course this does not preclude some mistake in adjusting the shared code paths... There is also a small Fortran front-end patch involved (hence CCing Tobias). The new gimplify processing changed behavior in handling GOMP_MAP_ALWAYS_POINTER maps such that the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the Fortran FE was generating a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and the pre-patch behavior was removing this map anyways. I have a small change in trans-openmp.c:gfc_trans_omp_array_section to not generate the map in this case, and so far no bad test results. Makes sense to argue that one separately, with testcases, for the master branch submission? Maybe. although this part was needed to solve a regression caused by the above changes. (2) The second part (though kind of related to the first above) are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. This behavior is also noted in the 5.0 spec, but not yet properly coded before. Likewise, if that makes sense? Some of the separation of base-pointer/array-section in map clauses seemed to step on this bug (e.g. if one mechanically updates "s.ptr[:N]" into "s.ptr, s.ptr[:N]", and a target-update overwrites the base-pointer) So it's arguably separate, but also can cause some testsuite chaos if not included together. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, so despite in the long term the entire map clause syntax parsing is probably going to be revamped, we're still adding this in for now. These changes are enabled for both OpenACC and OpenMP. Likewise, if that makes sense? ;-) Yeah, this might be separated :P Tested on x86_64-linux with nvptx offloading with no regressions. I'm seeing a regression with 'libgomp.oacc-c-c++-common/noncontig_array-1.c' execution testing, both C and C++, for '-O2' (but not '-O0'), and only for about half of the invocations. But it seems to reliable reproduce in GDB: Thread 1 "a.out" received signal SIGSEGV, Segmentation fault. gomp_decrement_refcount (do_remove=, do_copy=, delete_p=false, refcount_set=0x0, k=0xc4d450) at [...]/source-gcc/libgomp/target.c:468 468 uintptr_t orig_refcount = *refcount_ptr; (gdb) bt #0 gomp_decrement_refcount (do_remove=, do_copy=, delete_p=false, refcount_set=0x0, k=0xc4d450) at [...]/source-gcc/libgomp/target.c:468 #1 gomp_unmap_vars_internal (aq=0x0, aq@entry=0x8223c0, refcount_set=0x0, do_copyfrom=, do_copyfrom@entry=true, tgt=tgt@entry=0xc696a0) at [...]/source-gcc/libgomp/target.c:2065 #2 goacc_unmap_vars (tgt=tgt@entry=0xc696a0, do_copyfrom=do_copyfrom@entry=true, aq=aq@entry=0x0) at [...]/source-gcc/libgomp/target.c:2118 #3 0x77daa41c in GOACC_pa
[PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping semantics, and other front-end adjustments.
This patch largely implements three pieces of functionality: (1) Per discussion and clarification on the omp-lang mailing list, standards conforming behavior for mapping array sections should *NOT* also map the base-pointer, i.e for this code: struct S { int *ptr; ... }; struct S s; #pragma omp target enter data map(to: s.ptr[:100]) Currently we generate after gimplify: #pragma omp target enter data map(struct:s [len: 1]) map(alloc:s.ptr [len: 8]) \ map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) which is deemed incorrect. After this patch, the gimplify results are now adjusted to: #pragma omp target enter data map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) (the attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: #pragma omp target enter data map(to: s.ptr, s.ptr[:100]) This adjustment in behavior required a number of small adjustments here and there in gimplify, including to accomodate map sequences for C++ references. There is also a small Fortran front-end patch involved (hence CCing Tobias). The new gimplify processing changed behavior in handling GOMP_MAP_ALWAYS_POINTER maps such that the libgomp.fortran/struct-elem-map-1.f90 regressed. It appeared that the Fortran FE was generating a GOMP_MAP_ALWAYS_POINTER for array types, which didn't seem quite correct, and the pre-patch behavior was removing this map anyways. I have a small change in trans-openmp.c:gfc_trans_omp_array_section to not generate the map in this case, and so far no bad test results. (2) The second part (though kind of related to the first above) are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. This behavior is also noted in the 5.0 spec, but not yet properly coded before. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. This is actually mainly an effort to allow SPEC HPC to compile, so despite in the long term the entire map clause syntax parsing is probably going to be revamped, we're still adding this in for now. These changes are enabled for both OpenACC and OpenMP. Tested on x86_64-linux with nvptx offloading with no regressions. Pushed to devel/omp/gcc-10, will send mainline version of patch later. Chung-Lin 2021-05-11 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.c (struct omp_dim): New struct type for use inside c_parser_omp_variable_list. (c_parser_omp_variable_list): Allow multiple levels of array and component accesses in array section base-pointer expression. (c_parser_omp_clause_to): Set 'allow_deref' to true in call to c_parser_omp_var_list_parens. (c_parser_omp_clause_from): Likewise. * c-typeck.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (c_finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/cp/ChangeLog: * parser.c (struct omp_dim): New struct type for use inside cp_parser_omp_var_list_no_open. (cp_parser_omp_var_list_no_open): Allow multiple levels of array and component accesses in array section base-pointer expression. (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to cp_parser_omp_var_list for to/from clauses. * semantics.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (handle_omp_array_sections): Adjust pointer map generation of references. (finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_array_section): Do not generate GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type. gcc/ChangeLog: * gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter, accomodate case where 'offset' return of get_inner_reference is non-NULL. (is_or_contains_p): Further robustify conditions. (omp_target_reorder_clauses): In alloc/to/from sorting phase, also move following GOMP_MAP_ALWAYS_POINTER maps along. Add new sorting phase where we make sure pointers with an attach/detach map are ordered correctly. (gimplify_scan_omp_clauses): Add modifications to avoid creating GOMP_MAP_STRUCT and associated alloc map for attach/detach maps. gcc/testsuite/ChangeLog: * c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase. * c-c++-common/gomp/targe
Re: [PATCH, OG10, OpenMP 5.0, committed] Implement relaxation of implicit map vs. existing device mappings
On 2021/5/7 8:35 PM, Thomas Schwinge wrote: On 2021-05-05T23:17:25+0800, Chung-Lin Tang via Gcc-patches wrote: This patch implements relaxing the requirements when a map with the implicit attribute encounters an overlapping existing map. [...] Oh, oh, these data mapping interfaces/semantics ares getting more and more "convoluted"... %-\ (Not your fault, of course.) Haven't looked in too much detail in the patch/implementation (I'm not very well-versend in the exact OpenMP semantics anyway), but I suppose we should do similar things for OpenACC, too. I think we even currently do have a gimplification-level "hack" to replicate data clauses' array bounds for implicit data clauses on compute constructs, if the default "complete" mapping is going to clash with a "limited" mapping that's specified in an outer OpenACC 'data' directive. (That, of course, doesn't work for the general case of non-lexical scoping, or dynamic OpenACC 'enter data', etc., I suppose) I suppose your method could easily replace and improve that; we shall look into that later. That said, in your patch, is this current implementation (explicitly) meant or not meant to be active for OpenACC, too, or just OpenMP (I couldn't quickly tell), and/or is it (implicitly?) a no-op for OpenACC? It appears that I have inadvertently enabled it for OpenACC as well! But everything was tested together, so I assume it works okay for that mode as well. The entire set of implicit-specific actions are enabled by the setting of 'OMP_CLAUSE_MAP_IMPLICIT_P (clause) = 1' inside gimplify.c:gimplify_adjust_omp_clauses_1, so in case you want to disable it for OpenACC again, that's where you need to add the guard condition. Also, another adjustment in this patch is how implicitly created clauses are added to the current clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the new clauses to the end, this patch adds them at the position "after initial non-map clauses, but right before any existing map clauses". Probably you haven't been testing such a configuration; I've just pushed "Fix up 'c-c++-common/goacc/firstprivate-mappings-1.c' for C, non-LP64" to devel/omp/gcc-10 branch in commit c51cc3b96f0b562deaffcfbcc51043aed216801a, see attached. Thanks, I was relying on eyeballing to know where to fix testcases like this; I did fix another similar case, but missed this one. The reason for this is: when combined with other map clauses, for example: #pragma omp target map(rec.ptr[:N]) for (int i = 0; i < N; i++) rec.ptr[i] += 1; There will be an implicit map created for map(rec), because of the access inside the target region. The expectation is that 'rec' is implicitly mapped, and then the pointed array-section part by 'rec.ptr' will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' (in that order). If the implicit 'map(rec)' is appended to the end, instead of placed before other maps, the attachment operation will not find anything to attach to, and the entire region will fail. But that doesn't (negatively) affect user-visible semantics (OpenMP, and also OpenACC, if applicable), in that more/bigger objects then get mapped than were before? (I suppose not?) It probably won't affect user level semantics, although we should look out if this change in convention exposes some other bugs. Chung-Lin
[PATCH, OG10, OpenMP 5.0, committed] Implement relaxation of implicit map vs. existing device mappings
This patch implements relaxing the requirements when a map with the implicit attribute encounters an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22): "If a single contiguous part of the original storage of a list item with an implicit data-mapping attribute has corresponding storage in the device data environment prior to a task encountering the construct that is associated with the map clause, only that part of the original storage will have corresponding storage in the device data environment as a result of the map clause." Also tracked in the OpenMP spec context as issue #1463: https://github.com/OpenMP/spec/issues/1463 The implementation inside the compiler is to of course, tag the implicitly created maps with some indication of "implicit". I've done this with a OMP_CLAUSE_MAP_IMPLICIT_P macro, using 'base.deprecated_flag' underneath. There is an encoding of this as GOMP_MAP_IMPLICIT == GOMP_MAP_FLAG_SPECIAL_3|GOMP_MAP_FLAG_SPECIAL_4 in include/gomp-constants.h for the runtime, but I've intentionally avoided exploding the entire gimplify/omp-low with a new set of GOMP_MAP_IMPLICIT_TO/FROM/etc. symbols, instead adding in the new flag bits only at the final runtime call generation during omp-lowering. The rest is libgomp mapping taking care of the implicit case: allowing map success if an existing map is a proper subset of the new map, if the new map is implicit. Straightforward enough I think. There are also some additions to print the implicit attribute during tree pretty-printing, for that reason some scan tests were updated. Also, another adjustment in this patch is how implicitly created clauses are added to the current clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the new clauses to the end, this patch adds them at the position "after initial non-map clauses, but right before any existing map clauses". The reason for this is: when combined with other map clauses, for example: #pragma omp target map(rec.ptr[:N]) for (int i = 0; i < N; i++) rec.ptr[i] += 1; There will be an implicit map created for map(rec), because of the access inside the target region. The expectation is that 'rec' is implicitly mapped, and then the pointed array-section part by 'rec.ptr' will be mapped, and then attachment to the 'rec.ptr' field of the mapped 'rec' (in that order). If the implicit 'map(rec)' is appended to the end, instead of placed before other maps, the attachment operation will not find anything to attach to, and the entire region will fail. Note: this touches a bit on another issue which I will be sending a patch for later: per the discussion on omp-lang, an array section list item should *not* be mapping its base-pointer (although an attachment attempt should exist), while in current GCC behavior, for struct member pointers like 'rec.ptr' above, we do map it (which should be deemed incorrect). This means that as of right now, this modification of map order doesn't really exhibit the above mentioned behavior yet. I have included it as part of this patch because the "[implicit]" tree printing requires modifying many gimple scan tests already, so including the test modifications together seems more manageable patch-wise. Tested with no regressions, and pushed to devel/omp/gcc-10. Will be submitting a mainline trunk version later. Chung-Lin 2021-05-05 Chung-Lin Tang include/ChangeLog: * gomp-constants.h (GOMP_MAP_IMPLICIT): New special map kind bits value. (GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of special map kind bits. (GOMP_MAP_NONCONTIG_ARRAY_P): Adjust test for non-contiguous array map kind bits to be more specific. (GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds. gcc/ChangeLog: * tree.h (OMP_CLAUSE_MAP_IMPLICIT_P): New access macro for 'implicit' bit, using 'base.deprecated_flag' field of tree_node. * tree-pretty-print.c (dump_omp_clause): Add support for printing implicit attribute in tree dumping. * gimplify.c (gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_IMPLICIT_P to 1 if map clause is implicitly created. (gimplify_adjust_omp_clauses): Adjust place of adding implicitly created clauses, from simple append, to starting of list, after non-map clauses. * omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind values passed to libgomp for implicit maps. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-implicit-map-1.c: New test. * c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern. * c-c++-common/goacc/firstprivate-mappings-1.c: Likewise. * c-c++-common/goacc/mdc-1.c: Likewise. * c-c++-common/goacc/reduction-1.c: Likewise. * c-c++-common/goacc/redu
[PATCH, OG10, C++, OpenMP 5.0] Support lambda capturing of pointers and references in target directives
This patch adds proper lambda capturing of pointer and reference variables as specified in OpenMP 5.0. We map the entire closure object as a to-map, attach pointers to zero-length array sections, and perform mapping of references. The main way of implementation is by tree-walk when finishing processing of target directives. Due to this nature, it seemed only complete to combine the processing with all of the this[:1] map creation handling. This makes this patch also a partial rewrite of PR92120, though things seem to look better in the new form. (and yes, the submitted PR92120 patch for mainline is in need of a "v3" re-work) Now this tree walk is applied in the non-template case and after/during template instantiation, so a prior patch to relax finish_omp_clauses() cases to force the this[:1] changes to work are no longer needed, thus reverted in this patch. Tested without regressions on x86_64-linux with nvptx offloading, and pushed to devel/omp/gcc-10. 2021-03-18 Chung-Lin Tang gcc/cp/ChangeLog: * cp-tree.h (set_omp_target_this_expr): Delete. (finish_omp_target_clauses): New prototype. * lambda.c (lambda_expr_this_capture): Remove call to set_omp_target_this_expr. * parser.c (cp_parser_omp_target): Likewise. * pt.c (tsubst_expr): Add call to finish_omp_target_clauses for target directives. * semantics.c (omp_target_this_expr): Delete. (omp_target_ptr_members_accessed): Delete. (finish_non_static_data_member): Remove call to set_omp_target_this_expr. Remove use of omp_target_ptr_members_accessed. (finish_this_expr): Remove call to set_omp_target_this_expr. (struct omp_target_walk_data): New struct for walking over target-directive tree body. (finish_omp_target_clauses_r): New function for tree walk. (finish_omp_target_clauses): New function, with code factored out from finish_omp_target. Add lambda object handling case. (finish_omp_target): Factor code out and adjust to use finish_omp_target_clauses. (finish_omp_clauses): Revert prior "Adjustments to allow '*ptr' and 'ptr->member' cases in map clausess.", since not needed with new organization of target-directive clause processing. gcc/testsuite/ChangeLog: * g++.dg/gomp/target-lambda-1.C: New test. libgomp/testsuite/ChangeLog: * libgomp.c++/target-lambda-1.C: New test. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index b77bdc380a0..247a3bb1ec3 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7316,7 +7316,7 @@ extern void finish_lambda_scope (void); extern tree start_lambda_function (tree fn, tree lambda_expr); extern void finish_lambda_function (tree body); extern tree finish_omp_target (location_t, tree, tree, bool); -extern void set_omp_target_this_expr (tree); +extern void finish_omp_target_clauses (location_t, tree, tree *); /* in tree.c */ extern int cp_tree_operand_length (const_tree); diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index 9ecf0dbed0c..b55c2f85d27 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -842,9 +842,6 @@ lambda_expr_this_capture (tree lambda, int add_capture_p) type cast (_expr.cast_ 5.4) to the type of 'this'. [ The cast ensures that the transformed expression is an rvalue. ] */ result = rvalue (result); - - /* Acknowledge to OpenMP target that 'this' was referenced. */ - set_omp_target_this_expr (result); } return result; diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 1af233690a2..9fc2a9b05eb 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -40786,7 +40786,6 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, keep_next_level (true); tree sb = begin_omp_structured_block (), ret; unsigned save = cp_parser_begin_omp_structured_block (parser); - set_omp_target_this_expr (NULL_TREE); switch (ccode) { case OMP_TEAMS: @@ -40881,7 +40880,6 @@ cp_parser_omp_target (cp_parser *parser, cp_token *pragma_tok, "#pragma omp target", pragma_tok); c_omp_adjust_map_clauses (clauses, true); keep_next_level (true); - set_omp_target_this_expr (NULL_TREE); tree body = cp_parser_omp_structured_block (parser, if_p); finish_omp_target (pragma_tok->location, clauses, body, false); diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 90cee31bb5a..139d1075986 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -18631,6 +18631,11 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl, t = copy_node (t); OMP_BODY (t) = stmt; OMP_CLAUSES (t) = tmp; + + if (TREE_CODE (t) == OMP_TARGET) + finish_omp_target_clauses (EXPR_LOCATION (t), OMP_BODY (t), +
[PATCH, OG10, C++, committed] Fix non-static member mapping in templates
There was a case of the implicit non-static pointer member mapping not working properly with templates. What happened was that the code in finish_omp_target() created the map clauses (which normally runs after finish_omp_clauses), but being a template class it was put through all the tsubst_* stuff and at the end thrown into finish_omp_clauses a 2nd time. And because finish_omp_clauses didn't handle some of the implicitly created map clauses, things didn't work... This patch slightly fixes many handled cases in these parts, plus some adjustments in gimplify.c. Tested without regressions, and pushed to devel/omp/gcc-10. Chung-Lin From 4e714eaad985f68533f267b8df2026e5c14d084a Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 11 Mar 2021 00:31:08 -0800 Subject: [PATCH] Fix template case of non-static member access inside member functions Prior patches for C++ non-static member access had problems under template classes, due to re-calling of finish_omp_clauses after finish_omp_target created the implicit maps required, but not of allowed form in finish_omp_clauses. This patch solves this by slightly relaxing the allowed expressions in finish_omp_clauses. 2021-03-11 Chung-Lin Tang gcc/cp/ChangeLog: * semantics.c (finish_omp_clauses): Adjustments to allow '*ptr' and 'ptr->member' cases in map clausess. (finish_omp_target): Use INDIRECT_REF instead of MEM_REF in created clauses, add processing_template_decl handling. gcc/ChangeLog: * gimplify.c (gimplify_scan_omp_clauses): Under !DECL_P case of GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to struct_deref_set for map(*ptr_to_struct) cases. gcc/testsuite/ChangeLog: * g++.dg/gomp/target-this-3.C: Adjust scan test. * g++.dg/gomp/target-this-4.C: Likewise. * g++.dg/gomp/target-this-5.C: New test. libgomp/ChangeLog: * testsuite/libgomp.c++/target-this-5.C: New test. --- gcc/cp/semantics.c| 45 +-- gcc/gimplify.c| 19 +++ gcc/testsuite/g++.dg/gomp/target-this-3.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-4.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-5.C | 34 libgomp/testsuite/libgomp.c++/target-this-5.C | 30 ++ 6 files changed, 120 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/g++.dg/gomp/target-this-5.C create mode 100644 libgomp/testsuite/libgomp.c++/target-this-5.C diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 55a5983..5b62fa3 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -6407,6 +6407,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) bool order_seen = false; bool schedule_seen = false; bool oacc_async = false; + bool indirect_ref_p = false; bool indir_component_ref_p = false; tree last_iterators = NULL_TREE; bool last_iterators_remove = false; @@ -7516,6 +7517,14 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) indir_component_ref_p = true; STRIP_NOPS (t); } + indirect_ref_p = false; + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) + && INDIRECT_REF_P (t)) + { + t = TREE_OPERAND (t, 0); + indirect_ref_p = true; + STRIP_NOPS (t); + } if (TREE_CODE (t) == COMPONENT_REF && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP || ort == C_ORT_ACC) @@ -7551,6 +7560,12 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) break; } t = TREE_OPERAND (t, 0); + if (INDIRECT_REF_P (t)) + { + t = TREE_OPERAND (t, 0); + indir_component_ref_p = true; + STRIP_NOPS (t); + } } if (remove) break; @@ -7614,6 +7629,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) || (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER)) && !indir_component_ref_p + && !indirect_ref_p && !cxx_mark_addressable (t)) remove = true; else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP @@ -7698,7 +7714,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) } else { - bitmap_set_bit (_head, DECL_UID (t)); + if (!indirect_ref_p && !indir_component_ref_p) + bitmap_set_bit (_head, DECL_UID (t)); if (t != OMP_CLAUSE_DECL (c) && TREE_CODE (OMP_CLAUSE_DECL (c)) == COMPONENT_REF) bitmap_set_bit (_field_head, DECL_UID
[PATCH, OG10, OpenMP, committed] Support A->B expressions in map clause (C front-end)
This patch is a merge of parts from: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562467.html and devel/omp/gcc-10 commit 36a1eb, which was a modified merge of: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558975.html to provide the equivalent front-end patches for support "map(A->B)" clauses for the C front-end (only the C++ front-end received such changes before). Some associated middle-end changes are also in this patch. Tested without regressions, and pushed to devel/omp/gcc-10. Chung-Lin From 08caada8efd8f35db634647bbda6091fb667b00d Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Mon, 8 Mar 2021 15:56:52 +0800 Subject: [PATCH] Arrow operator handling for C front-end in OpenMP map clauses This patch merges some of the equivalent changes already done for the C++ front-end to the C parts. 2021-03-08 Chung-Lin Tang gcc/c/ChangeLog: * c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in call to c_parser_omp_variable_list to 'true'. * c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in array base handling. (c_finish_omp_clauses): Handle 'A->member' case in map clauses. gcc/ChangeLog: * gimplify.c (gimplify_scan_omp_clauses): Add MEM_REF case when handling component_ref_p case. Add unshare_expr and gimplification when created GOMP_MAP_STRUCT is not a DECL. Add code to add firstprivate pointer for *pointer-to-struct case. gcc/testsuite/ChangeLog: * gcc.dg/gomp/target-3.c: New test. --- gcc/c/c-parser.c | 3 +- gcc/c/c-typeck.c | 22 +++ gcc/gimplify.c | 41 ++-- gcc/testsuite/gcc.dg/gomp/target-3.c | 16 +++ 4 files changed, 79 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index fae597128e9..0a6aee439f6 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -15700,7 +15700,8 @@ c_parser_omp_clause_map (c_parser *parser, tree list) } } - nl = c_parser_omp_variable_list (parser, clause_loc, OMP_CLAUSE_MAP, list); + nl = c_parser_omp_variable_list (parser, clause_loc, OMP_CLAUSE_MAP, list, + C_ORT_OMP, true); for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c)) OMP_CLAUSE_SET_MAP_KIND (c, kind); diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 6af19766324..7c887a80ce9 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -12917,6 +12917,12 @@ handle_omp_array_sections_1 (tree c, tree t, vec , return error_mark_node; } t = TREE_OPERAND (t, 0); + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) + && TREE_CODE (t) == MEM_REF) + { + t = TREE_OPERAND (t, 0); + STRIP_NOPS (t); + } if (ort == C_ORT_ACC && TREE_CODE (t) == MEM_REF) { if (maybe_ne (mem_ref_offset (t), 0)) @@ -13778,6 +13784,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) tree ordered_clause = NULL_TREE; tree schedule_clause = NULL_TREE; bool oacc_async = false; + bool indir_component_ref_p = false; tree last_iterators = NULL_TREE; bool last_iterators_remove = false; tree *nogroup_seen = NULL; @@ -14505,6 +14512,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) { while (TREE_CODE (t) == COMPONENT_REF) t = TREE_OPERAND (t, 0); + if (TREE_CODE (t) == MEM_REF) + { + t = TREE_OPERAND (t, 0); + STRIP_NOPS (t); + } if (bitmap_bit_p (_field_head, DECL_UID (t))) break; if (bitmap_bit_p (_head, DECL_UID (t))) @@ -14561,6 +14573,15 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) bias) to zero here, so it is not set erroneously to the pointer size later on in gimplify.c. */ OMP_CLAUSE_SIZE (c) = size_zero_node; + indir_component_ref_p = false; + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) + && TREE_CODE (t) == COMPONENT_REF + && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF) + { + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + indir_component_ref_p = true; + STRIP_NOPS (t); + } if (TREE_CODE (t) == COMPONENT_REF && OMP_CLAUSE_CODE (c) != OMP_CLAUSE__CACHE_) { @@ -14633,6 +14654,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) else if ((OMP_CLAUSE_CODE (c) != OMP_C
[PATCH, C++, OG10, OpenACC/OpenMP, committed] Allow static constexpr fields in mappable types
On 2020/1/21 12:49 AM, Jakub Jelinek wrote: The OpenMP 4.5 definition of mappable type for C++ is that - All data members must be non-static. among other requirements. In OpenMP 5.0 that has been removed. So, if we follow the 4.5 definition, it shouldn't change, if we follow 5.0 definition, the whole loop should be dropped, but in no case shall static constexpr data members be treated any differently from any other static data members. We have merged the patch as is (only static constexprs) to devel/omp/gcc-10 for now. Its possible that the entire checking loop should be eventually removed to allow the full 5.0 range, but wondered if things like (automatic) accessibility of the static members within target regions is an issue to resolve? For now, I've committed the patch in its current state to OG10. Re-tested on OG10, and committed with an additional testcase (same for OpenMP) Chung-Lin cp/ * decl2.c (cp_omp_mappable_type_1): Allow fields with DECL_DECLARED_CONSTEXPR_P to be mapped. testsuite/ * g++.dg/goacc/static-constexpr-1.C: New test. * g++.dg/gomp/static-constexpr-1.C: New test. From 1c3f38b30c1db0aef5ccbf6d20fb5fd13785d482 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Wed, 3 Mar 2021 22:39:10 +0800 Subject: [PATCH] Allow static constexpr fields in mappable types for C++ This patch is a merge of: https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg01246.html Static members in general disqualify a C++ class from being target mappable, but static constexprs are inline optimized away, so should not interfere. OpenMP 5.0 in general lifts the static member limitation, so this patch will probably further adjusted later. 2021-03-03 Chung-Lin Tang gcc/cp/ChangeLog: * decl2.c (cp_omp_mappable_type_1): Allow fields with DECL_DECLARED_CONSTEXPR_P to be mapped. gcc/testsuite/ChangeLog: * g++.dg/goacc/static-constexpr-1.C: New test. * g++.dg/gomp/static-constexpr-1.C: New test. --- gcc/cp/decl2.c | 5 - gcc/testsuite/g++.dg/goacc/static-constexpr-1.C | 17 + gcc/testsuite/g++.dg/gomp/static-constexpr-1.C | 17 + 3 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/goacc/static-constexpr-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/static-constexpr-1.C diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 5343ea3b068..872122fe83c 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -1460,7 +1460,10 @@ cp_omp_mappable_type_1 (tree type, bool notes) { tree field; for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) - if (VAR_P (field)) + if (VAR_P (field) + /* Fields that are 'static constexpr' can be folded away at compile + time, thus does not interfere with mapping. */ + && !DECL_DECLARED_CONSTEXPR_P (field)) { if (notes) inform (DECL_SOURCE_LOCATION (field), diff --git a/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C b/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C new file mode 100644 index 000..edf5f1a7628 --- /dev/null +++ b/gcc/testsuite/g++.dg/goacc/static-constexpr-1.C @@ -0,0 +1,17 @@ +// { dg-do compile } +// { dg-require-effective-target c++11 } + +/* Test that static constexpr members do not interfere with offloading. */ +struct rec +{ + static constexpr int x = 1; + int y, z; +}; + +void foo (rec& r) +{ + #pragma acc parallel copy(r) + { +r.y = r.y = r.x; + } +} diff --git a/gcc/testsuite/g++.dg/gomp/static-constexpr-1.C b/gcc/testsuite/g++.dg/gomp/static-constexpr-1.C new file mode 100644 index 000..39eee92 --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/static-constexpr-1.C @@ -0,0 +1,17 @@ +// { dg-do compile } +// { dg-require-effective-target c++11 } + +/* Test that static constexpr members do not interfere with offloading. */ +struct rec +{ + static constexpr int x = 1; + int y, z; +}; + +void foo (rec& r) +{ + #pragma omp target map(r) + { +r.y = r.y = r.x; + } +} -- 2.17.1
[PATCH, OG10, OpenMP, committed] Fix array members in OpenMP map clauses
Previous patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564976.html was reverted by Catherine when I was away, due to regressions in mapping array members. The fix appears to be a re-placement of finish_non_static_data_member() inside handle_omp_array_sections(). Tested and committed to devel/omp/gcc-10, the above patch was also re-committed as well. Chung-Lin From da047f63c601118ad875d13929453094acc6c6c9 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Fri, 26 Feb 2021 20:13:29 +0800 Subject: [PATCH] Fix regression of array members in OpenMP map clauses. Fixed a regression of array members not working in OpenMP map clauses after commit bf8605f14ec33ea31233a3567f3184fee667b695. This patch itself probably should be considered a fix for commit aadfc9843. 2021-02-26 Chung-Lin Tang gcc/cp/ChangeLog: * semantics.c (handle_omp_array_sections): Adjust position of making COMPONENT_REF from FIELD_DECL to earlier position. --- gcc/cp/semantics.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 370d5831091..55a5983528e 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -5386,6 +5386,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) } OMP_CLAUSE_DECL (c) = first; OMP_CLAUSE_SIZE (c) = size; + if (TREE_CODE (t) == FIELD_DECL) + t = finish_non_static_data_member (t, NULL_TREE, NULL_TREE); if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP || (TREE_CODE (t) == COMPONENT_REF && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)) @@ -5414,8 +5416,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) } tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); - if (TREE_CODE (t) == FIELD_DECL) - t = finish_non_static_data_member (t, NULL_TREE, NULL_TREE); if ((ort & C_ORT_OMP_DECLARE_SIMD) != C_ORT_OMP && ort != C_ORT_ACC) OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_POINTER); else if (TREE_CODE (t) == COMPONENT_REF) -- 2.17.1
[PATCH, OG10, committed] Support A->B expressions in map clause
This patch tries to allow map(A->ptr) to be properly handled the same way as map(B.ptr) expressions. map(struct:*A) clauses are now produced during gimplify. Julian, I'm CCing you since IIRC you seemed to be the author of this area of code. Would appreciate if you gave a look if you have time, though I've already went ahead and pushed to OG10 after testing results looked okay. Thanks, Chung-Lin gcc/ChangeLog: * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map *. Adjust struct map handling to handle cases of *A and A->B expressions. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. gcc/testsuite/ChangeLog: * g++.dg/gomp/target-3.C: Adjust testcase gimple scanning. * g++.dg/gomp/target-this-2.C: Likewise. * g++.dg/gomp/target-this-3.C: Likewise. * g++.dg/gomp/target-this-4.C: Likewise. libgomp/ChangeLog: * testsuite/libgomp.c++/target-23.C: New testcase. From bf8605f14ec33ea31233a3567f3184fee667b695 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Mon, 8 Feb 2021 07:53:55 -0800 Subject: [PATCH] Enable gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF (INDIRECT_REF ...)) map clauses. This patch tries to allow map(A->ptr) to be properly handled the same way as map(B.ptr) expressions. map(struct:*A) clauses are now produced during gimplify. This patch, as of time of commit, is only pushed to devel/omp/gcc-10, not yet submitted as mainline patch to upstream. 2021-02-08 Chung-Lin Tang gcc/ChangeLog: * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map *. Adjust struct map handling to handle cases of *A and A->B expressions. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. gcc/testsuite/ChangeLog: * g++.dg/gomp/target-3.C: Adjust testcase gimple scanning. * g++.dg/gomp/target-this-2.C: Likewise. * g++.dg/gomp/target-this-3.C: Likewise. * g++.dg/gomp/target-this-4.C: Likewise. libgomp/ChangeLog: * testsuite/libgomp.c++/target-23.C: New testcase. --- gcc/gimplify.c| 51 +++ gcc/testsuite/g++.dg/gomp/target-3.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-2.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-3.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-4.C | 4 +-- libgomp/testsuite/libgomp.c++/target-23.C | 34 + 6 files changed, 78 insertions(+), 17 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c++/target-23.C diff --git a/gcc/gimplify.c b/gcc/gimplify.c index b90ba5b..ba19017 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include "langhooks.h" #include "tree-cfg.h" #include "tree-ssa.h" +#include "tree-hash-traits.h" #include "omp-general.h" #include "omp-low.h" #include "gimple-low.h" @@ -8514,7 +8515,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, { struct gimplify_omp_ctx *ctx, *outer_ctx; tree c; - hash_map *struct_map_to_clause = NULL; + hash_map *struct_map_to_clause = NULL; hash_set *struct_deref_set = NULL; tree *prev_list_p = NULL, *orig_list_p = list_p; int handled_depend_iterators = -1; @@ -9082,12 +9083,15 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, && TREE_CODE (decl) == INDIRECT_REF && TREE_CODE (TREE_OPERAND (decl, 0)) == COMPONENT_REF && (TREE_CODE (TREE_TYPE (TREE_OPERAND (decl, 0))) - == REFERENCE_TYPE)) + == REFERENCE_TYPE) + && (OMP_CLAUSE_MAP_KIND (c) + != GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION)) { pd = _OPERAND (decl, 0); decl = TREE_OPERAND (decl, 0); } bool indir_p = false; + bool component_ref_p = false; tree orig_decl = decl; tree decl_ref = NULL_TREE; if ((region_type & (ORT_ACC | ORT_TARGET | ORT_TARGET_DATA)) != 0 @@ -9098,6 +9102,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, while (TREE_CODE (decl) == COMPONENT_REF) { decl = TREE_OPERAND (decl, 0); + component_ref_p = true; if (((TREE_CODE (decl) == MEM_REF && integer_zerop (TREE_OPERAND (
Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0
On 2021/1/16 5:45 下午, Jakub Jelinek wrote: +/* Unified reference count for structure element siblings, this is used + when REFCOUNT_STRUCTELEM_FIRST_P(k->refcount) == true, the first sibling + in a structure element sibling list item sequence. */ +uintptr_t structelem_refcount; + +/* When REFCOUNT_STRUCTELEM_P (k->refcount) == true, this field points REFCOUNT_STRUCTELEM_P (k->refcount) is true even for REFCOUNT_STRUCTELEM_FIRST_P(k->refcount), so shouldn't the description say that structelem_refcount_ptr is only used if REFCOUNT_STRUCTELEM_P (k->refcount) && !REFCOUNT_STRUCTELEM_FIRST_P (k->refcount) ? Sure, I'll revise the comments a bit. + into the (above) structelem_refcount field of the _FIRST splay_tree_key, + the first key in the created sequence. All structure element siblings + share a single refcount in this manner. Since these two fields won't be + used at the same time, they are stashed in a union. */ +uintptr_t *structelem_refcount_ptr; + }; struct splay_tree_aux *aux; }; /* The comparison function. */ Anyway, most of the patch looks good, but I'd like to understand the rationale for choosing a htab over what I've been trying to suggest, which was essentially instead of incrementing or decrementing refcounts push them into a vector for later incrementing/decrementing, then qsort the vector (by the pointers to refcounts) and increment what the elements point to unless the same address has been incremented/decremented already. Jakub Essentially the requirement is to increment/decrement a refcount only once per construct, so using a pointer-set (implemented by htab_t here) to track the processing status seemed to be more intuitive in code, and probably faster than sorting a vector I think (at least in most cases). Chung-Lin
Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0
Ping x2. Hi Jakub, would like this part of OpenMP 5.0 to be considered for GCC 11. Thanks, Chung-Lin On 2020/12/14 6:32 PM, Chung-Lin Tang wrote: Ping. On 2020/12/4 10:15 PM, Chung-Lin Tang wrote: Hi Jakub, this is a new version of the structure element mapping patch for OpenMP 5.0 requirement changes. This one uses the approach you've outlined in your concept patch [1], basically to use more special REFCOUNT_* values to mark them, and link following structure element splay_tree_keys back to the first key's refcount. [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557622.html Implementation notes of the attached patch: (1) This patch solves the 5.0 requirements of "not already incremented/decremented because of the effect of a map clause on the construct" by pulling in libgomp/hashtab.h and using htab_t as a pointer set. A "htab_t *refcount_set" is added in map/unmap routines to track the processing status of the uintptr_t* addresses of refcount fields in splay_tree_keys. * Currently this patch is using the same htab_create/htab_free routines like in task.c. I toyed with creating a 'htab_alloca' macro (allocating a fixed size htab) to speed things further, but decided to play it safer for the current patch. (2) Because of the use of pointer-to-refcounts as the basis, and structure element siblings all share a same refcount, uniform increment/decrement without repeating is also naturally achieved. (3) Because of the need to remove whole structure element sibling sequences out of context, it appears we need to mark the first/last of such a sequence. You'll see that the special REFCOUNT_* values have been expanded a bit more than your concept patch (at some point we should think about stop abusing it and add a proper flags word) (4) The new increment/decrement routines combine most of the new refcount_set lookup code with the refcount adjusting. For the decrement routine, "copy" and "removal" are now separate return values, since for structure element sequences, even when signalling "removal" you may still need to finish the "copy" work of following target_var_descs. (5) There are some re-organizing changes to oacc-parallel.c and oacc-mem.c, but most of the code that matters is in target.c. (6) New testcases have been added to reflect the cases discussed on omp-lang list. This patch has been tested for libgomp with no regressions on x86_64-linux with nvptx offloading. Since I submitted the first "v1" patch long ago, is this okay to be considered as committable now after approval? Thanks, Chung-Lin 2020-12-04 Chung-Lin Tang libgomp/ * hashtab.h (htab_clear): New function with initialization code factored out from... (htab_create): ...here, adjust to use htab_clear function. * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of special refcount values, add comments. (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL. (REFCOUNT_LINK): Likewise. (REFCOUNT_STRUCTELEM): New special refcount range for structure element siblings. (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element sibling maps. (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling. (REFCOUNT_STRUCTELEM_FLAG_LAST): Flag to indicate last sibling. (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag. (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag. (struct splay_tree_key_s): Add structelem_refcount and structelem_refcount_ptr fields into a union with dynamic_refcount. Add comments. (gomp_map_vars): Delete declaration. (gomp_map_vars_async): Likewise. (gomp_unmap_vars): Likewise. (gomp_unmap_vars_async): Likewise. (goacc_map_vars): New declaration. (goacc_unmap_vars): Likewise. * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars. (goacc_enter_datum): Likewise. (goacc_enter_data_internal): Likewise. * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars and goacc_unmap_vars. (GOACC_data_start): Adjust to use goacc_map_vars. (GOACC_data_end): Adjust to use goacc_unmap_vars. * target.c (hash_entry_type): New typedef. (htab_alloc): New function hook for hashtab.h. (htab_free): Likewise. (htab_hash): Likewise. (htab_eq): Likewise. (hashtab.h): Add file include. (gomp_increment_refcount): New function. (gomp_decrement_refcount): Likewise. (gomp_map_vars_existing): Add refcount_set parameter, adjust to use gomp_increment_refcount. (gomp_map_fields_existing): Add refcount_set parameter, adjust calls to gomp_map_vars_existing. (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p variable to guard OpenMP specific paths, adjust calls to gomp_map_vars_existing, add structure element sibli