Re: [PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

2024-06-12 Thread Tobias Burnus

Andrew Stubbs wrote:

Compared to the previous v4 (1/5) posting of this patch:
- The enumeration of the ompx allocators have been moved (again) to 200
   (as 100 is already in use by another toolchain vendor and this seems
   like a possible source of confusion).
- The "ompx" has also been changed to "ompx_gnu" to highlight that these
   are specifically GNU extensions.
- The failure mode of the testcases had been modified, including adding
   an abort in CHECK_SIZE and skipping the test on unsupported platforms.
- The OMP_ALLOCATE environment variable now supports the new allocator.
- The Fortran frontend allows use of the new allocator in "allocator"
   clauses.

---

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.  One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.


The patch LGTM.

Thanks!

Tobias

gcc/fortran/ChangeLog:

* openmp.cc (is_predefined_allocator): Update valid ranges to
  incorporate ompx_gnu_pinned_mem_alloc.

libgomp/ChangeLog:

* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge
---
  gcc/fortran/openmp.cc |  11 +-
  .../gfortran.dg/gomp/allocate-pinned-1.f90|  16 +++
  libgomp/allocator.c   | 115 +-
  libgomp/env.c |   1 +
  libgomp/libgomp.texi  |   7 +-
  libgomp/omp.h.in  |   1 +
  libgomp/omp_lib.f90.in|   2 +
  libgomp/omp_lib.h.in  |   2 +
  libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 100 +++
  libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 102 
  .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
  11 files changed, 336 insertions(+), 37 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90
  create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
  create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
  create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90


Re: [PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode

2024-06-12 Thread Tobias Burnus

Andrew Stubbs wrote:

The feature doesn't work on non-Linux hosts, at present, so skip the tests
entirely.

On Linux systems that have insufficient lockable memory configured we still
need to fail or else the feature won't be getting tested when we think it is,
but now there's a message to explain why.

libgomp/ChangeLog:

* testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to
dg-skip-if.
Correct spelling mistake.
Abort on insufficient lockable memory.
Use #error on non-linux hosts.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.


LGTM. Thanks!

Tobias



Re: [Patch, PR Fortran/90072] Polymorphic Dispatch to Polymophic Return Type Memory Leak

2024-06-08 Thread Tobias Burnus

Andre Vehreschild wrote:

PS That's good news about the funding. Maybe we will get to see "built in"
coarrays soon?

You hopefully will see Nikolas work on the shared memory coarray support, if
that is what you mean by "built in" coarrays. I will be working on the
distributed memory coarray support esp. fixing the module issues and some other
team related things.


Cool! (Both of it.)

I assume "distributed memory coarray support" is still based on Open
Coarrays?

* * *

I am asking because there is coarray API being defined: Parallel Runtime
Interface for Fortran (PRIF), https://go.lbl.gov/prif

with an implementation called Caffeine – CoArray Fortran Framework of
Efficient Interfaces to Network Environments,
https://crd.lbl.gov/caffeine which uses GASNet or POSIX processes.

Well, the among the implementers is (unsurprising?) Damian – and the
idea seems to be that LLVM's FLANG will use the API.

Tobias

PS: I think it might be useful in the long run to support both
PRIF/Caffeine and OpenCoarrays.

I have attached my hello-world patch for -fcoarray=prif that I wrote
after ISC-HPC; it only handles this_image() / num_images() + init/stop.
I got confirmation by the PRIF developers that the next revision will
permit calling __prif_MOD_prif_init multiple times such that one can use
it in the constructor for static coarrays, which won't work otherwise.
gcc/ChangeLog:

	* flag-types.h (enum gfc_fcoarray):

gcc/fortran/ChangeLog:

	* invoke.texi:
	* lang.opt:
	* trans-decl.cc (gfc_build_builtin_function_decls):
	(create_main_function):
	* trans-intrinsic.cc (trans_this_image):
	(trans_num_images):
	* trans.h (GTY):

 gcc/flag-types.h   |  3 ++-
 gcc/fortran/invoke.texi|  7 +-
 gcc/fortran/lang.opt   |  5 +++-
 gcc/fortran/trans-decl.cc  | 56 --
 gcc/fortran/trans-intrinsic.cc | 42 +++
 gcc/fortran/trans.h|  5 
 6 files changed, 108 insertions(+), 10 deletions(-)

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 5a2b461fa75..babd747c01d 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -427,7 +427,8 @@ enum gfc_fcoarray
 {
   GFC_FCOARRAY_NONE = 0,
   GFC_FCOARRAY_SINGLE,
-  GFC_FCOARRAY_LIB
+  GFC_FCOARRAY_LIB,
+  GFC_FCOARRAY_PRIF
 };
 
 
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 40e8e4a7cdd..331a40d31db 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1753,7 +1753,12 @@ Single-image mode, i.e. @code{num_images()} is always one.
 
 @item @samp{lib}
 Library-based coarray parallelization; a suitable GNU Fortran coarray
-library needs to be linked.
+library needs to be linked such as @url{http://opencoarrays.org}.
+
+@item @samp{prif}
+Using the Parallel Runtime Interface for Fortran (PRIF),
+@url{https://go.lbl.gov/@/prif}; for instance, via Caffeine,
+@url{https://go.lbl.gov/@/caffeine}.
 @end table
 
 
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 5efd4a0129a..9ba957d5571 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -786,7 +786,7 @@ Copy array sections into a contiguous block on procedure entry.
 
 fcoarray=
 Fortran RejectNegative Joined Enum(gfc_fcoarray) Var(flag_coarray) Init(GFC_FCOARRAY_NONE)
--fcoarray=	Specify which coarray parallelization should be used.
+-fcoarray=	Specify which coarray parallelization should be used.
 
 Enum
 Name(gfc_fcoarray) Type(enum gfc_fcoarray) UnknownError(Unrecognized option: %qs)
@@ -800,6 +800,9 @@ Enum(gfc_fcoarray) String(single) Value(GFC_FCOARRAY_SINGLE)
 EnumValue
 Enum(gfc_fcoarray) String(lib) Value(GFC_FCOARRAY_LIB)
 
+EnumValue
+Enum(gfc_fcoarray) String(prif) Value(GFC_FCOARRAY_PRIF)
+
 fcheck=
 Fortran RejectNegative JoinedOrMissing
 -fcheck=[...]	Specify which runtime checks are to be performed.
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index dca7779528b..d1c0e2ee997 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -170,6 +170,10 @@ tree gfor_fndecl_co_sum;
 tree gfor_fndecl_caf_is_present;
 tree gfor_fndecl_caf_random_init;
 
+tree gfor_fndecl_prif_init;
+tree gfor_fndecl_prif_stop;
+tree gfor_fndecl_prif_this_image_no_coarray;
+tree gfor_fndecl_prif_num_images;
 
 /* Math functions.  Many other math functions are handled in
trans-intrinsic.cc.  */
@@ -4147,6 +4151,31 @@ gfc_build_builtin_function_decls (void)
 	get_identifier (PREFIX("caf_random_init")),
 	void_type_node, 2, logical_type_node, logical_type_node);
 }
+  else if (flag_coarray == GFC_FCOARRAY_PRIF)
+{
+  tree pint_type = build_pointer_type (integer_type_node);
+  tree pbool_type = build_pointer_type (boolean_type_node);
+  tree pintmax_type_node = get_typenode_from_name (INTMAX_TYPE);
+  pintmax_type_node = build_pointer_type (pintmax_type_node);
+
+  gfor_fndecl_prif_init = gfc_build_library_function_decl_with_spec (
+	get_identifier ("__prif_MOD_prif_init"), ". W ",
+	void_type_node, 1, 

Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-08 Thread Tobias Burnus

Hi Gerald,

Gerald Pfeifer wrote:

Looks like a janitorial task to fix the absolute links, possibly
excluding those with /git, /onlinedocs, /wiki – or assuming that the
main page is GCC.gnu.org, relying on the redirects.

It's on my list. A first quick check indicates there isn't much to do,
though. :-)


You could consider

htdocs/search.html:

to avoid a redirect (but it is not a broken link);
otherwise, I but I concur that it seems to be (mostly) fine :-)

* * *


+  loop-transformation constructs are now supported.
I'm thinking "loop transformation" in English? Or is this a specific term
from the standard?

Loop transformation happens at the end. But e.g "(#pragma omp) unroll
full" is a directive and, e.g.
...
is a construct (= directive + structured block (if any) + end directive
(if any)).

I believe there was a misunderstanding and I wasn't clear enough: I was
wondering whether instead of "loop-transformation" the patch should have
"loop transformation".

In your response you use the version without dash, so I guess we agree?
:-)


(Pedantically it's a hyphen (-) and not a(n en/em) dash (–/—), i.e. '-' 
not '--' or '---' in TeX.)


No, we don't. – There is a difference whether the two words are used 
alone or as modifier to a noun, like the "this is well defined" vs. "a 
well-defined project".


Thus, while "loop transformation happens" is without hyphen (as we both 
agree),* for "loop(-| )tranformation constructs" the (non-)usage of 
hyphens is not well defined; grouping wise, those are clearly '((loop 
transformation) constructs)' and not '(loop (transformation constructs))'.


I believe both variants are perfectly fine.

BTW: In the OpenMP pre-6.0 draft (TR12), the verb 'transform' is now 
used as noun not with suffix '-ation' but with the suffix '-ing' (also 
referred to as gerund) such that a section title now uses 
"Loop-Transforming Constructs"; I think for '(word) plus (-ing word)' – 
used as modifier –, a hyphen is a tad more common than for '(word) plus 
'(word with -ation suffix)'.


Tobias

* The Oxford Guide to Style points out some words that do get 
hyphenated: clear-cut, drip-proof, take-off, part-time, … – or to refer 
to the abstract meaning rather than literal: bull's-eye, crow's-feet, … 
— Formerly, present particle plus noun got hyphenated when the compound 
was acted on: walking-stick, walking-frame. Likewise, it was formerly 
normal in British English to hyphenate a single adjectival noun and the 
noun it modified: note-cue, title-page, volume-number (less common now, 
but can linger in some combination). And until recently: small 
scale-factory (vs. small-scale factory), white water-lily (vs. 
white-water lily).


Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-06 Thread Tobias Burnus

Hi Gerald,

Gerald Pfeifer wrote:

+++ b/htdocs/gcc-15/changes.html
+
+  https://gcc.gnu.org/projects/gomp/;>OpenMP

Can you please make this a relative link, i.e. "../projects/gomp/"?


Good point. I thought such links should be absolute because of 
(www.)GNU.org, i.e.


https://www.gnu.org/software/gcc/releases.html

... but also that page has https://www.gnu.org/software/gcc/projects/gomp/

GNU.org does not have the documentation, but going to 
https://www.gnu.org/software/gcc/onlinedocs/ or a subpage redirects (302 
temporary redirect) to the GCC website. Likewise for '../git' but for 
'../wiki' it has a HTTP 404 not found; fortunately, ../wiki/ works.


I think there are plenty of links which could be relative ones but are 
absolute ones.


Looks like a janitorial task to fix the absolute links, possibly 
excluding those with /git, /onlinedocs, /wiki – or assuming that the 
main page is GCC.gnu.org, relying on the redirects.


In any case, those links are probably broken on GNU.org:

htdocs/gcc-14/porting_to.html:href="/onlinedocs/gcc-14.1.0/gcc/Diagnostic-Pragmas.html">#pragma 
GCC diagnostic warning


htdocs/gcc-5/changes.html:    A href="/onlinedocs/libstdc++/manual/using_dual_abi.html">Dual


* * *


+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+

I'm thinking "loop transformation" in English? Or is this a specific term
from the standard?


Loop transformation happens at the end. But e.g "(#pragma omp) unroll 
full" is a directive and, e.g.


#pragma omp unroll partial(2)

for (int i=0; i < n; i++)

a[i] = 5;

is a construct (= directive + structured block (if any) + end directive 
(if any)).


Tobias



Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Tobias Burnus

Sandra Loosemore wrote:

On 6/6/24 06:06, Tobias Burnus wrote:
+@item I/O within OpenMP target regions and OpenACC compute regions 
is supported

+  using the C library @code{printf} functions.
+  Additionally, the Fortran @code{print}/@code{write} 
statements are
+  supported within OpenMP target regions, but not yet OpenACC 
compute
+  regions.  @c The latter needs 
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.




I think an "in" (or 'within') is missing before OpenACC.


Yes, "...not yet within OpenACC compute regions", please.


Thanks! Committed as https://gcc.gnu.org/r15-1072-g423522aacd9f30

Tobias



Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Tobias Burnus

Hi Thomas,

regarding the commit r15-1070-g3a4775d4403f2e / https://gcc.gnu.org/r15-1070

First, thanks for adding I/O support to nvptx offloading.

I have a wording nit, to be confirmed by a native speaker:


--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi

...

+@item I/O within OpenMP target regions and OpenACC compute regions is 
supported

+  using the C library @code{printf} functions.
+  Additionally, the Fortran @code{print}/@code{write} statements are
+  supported within OpenMP target regions, but not yet OpenACC compute
+  regions.  @c The latter needs 
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.




I think an "in" (or 'within') is missing before OpenACC.

Otherwise, it seemed to fine at a glance – and I am happy that that 
feature now finally works :-)


Hooray, no longer using reverse offload ("!$omp target 
device(ancestor:1)") for Fortran I/O when debugging.


Thanks,

Tobias


Re: [PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-06-06 Thread Tobias Burnus

Hi Andrew, hi Jakub, hello world,

Andrew Stubbs wrote:


Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100"


100 is a bad value - as can be seen below.

As Jakub suggested at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640432.html
"given that LLVM uses 100-102 range, perhaps pick a different one, 200 or 150"

(I know that the first review email suggested 100.)


This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.


Namely: ompx_pinned_mem_alloc

RFC: Should we use this name or - similar to LLVM - prefix this by
a vendor prefix instead (gnu_omp_ or gcc_omp_ instead of ompx_)?

IMHO it is fine to use ompx_ for pinned as the semantic is clear
and should be compatible with IBM and AMD.

For other additional memspaces / allocators, I am less sure, i.e.
on OG13 there are:
- ompx_unified_shared_mem_space, ompx_host_mem_space
- ompx_unified_shared_mem_alloc, ompx_host_mem_alloc

(BTW: In light of TR13 naming, the USM one could be
..._devices_all_mem_{alloc,space}, just to start some bikeshading
or following LLVM + Intel '…target_{host,shared}…'.)

* * *

Looking at other compilers:

IBM's compiler, https://www.ibm.com/docs/en/SSXVZZ_16.1.1/pdf/compiler.pdf , 
has:
- ompx_pinned_mem_alloc, tagged as IBM extension and otherwise without 
documenting it further

Checking omp.h, they define it as:
  ompx_pinned_mem_alloc = 9, /* Preview of host pinned memory support */
and additionally have:
  LOMP_MAX_MEM_ALLOC = 1024,

AMD's compiler based on clang has:
  /* Preview of pinned memory support */
  ompx_pinned_mem_alloc = 120,
in addition to the LLVM defines shown below.

Regarding LLVM:
- they don't offer 'pinned'
- they use the prefix 'llvm_omp' not 'ompx'

Namely:
typedef enum omp_allocator_handle_t
...
  llvm_omp_target_host_mem_alloc = 100,
  llvm_omp_target_shared_mem_alloc = 101,
  llvm_omp_target_device_mem_alloc = 102,
...
typedef enum omp_memspace_handle_t
...
  llvm_omp_target_host_mem_space = 100,
  llvm_omp_target_shared_mem_space = 101,
  llvm_omp_target_device_mem_space = 102,

Remark: I did not find a documentation - and while I
understand in principle host and shared, I wonder how
LLVM handles 'device_mem_space' when there is more than
one device.

BTW: OpenMP TR13 avoids this issue by adding two sets of
API routines. Namely:

First, for memspaces,
- omp_get_{device,devices}_memspace
- omp_get_{device,devices}_and_host_memspace
- omp_get_devices_all_memspace

and, secondly, for allocators:
- omp_get_{device,devices}_allocator
- omp_get_{device,devices}_and_host_allocator
- omp_get_devices_all_allocator

where omp_get_device_* takes a single device number and
omp_get_devices_* a list of device numbers while _and_host
automatically adds the initial device to the list.

* * *

Looking at Intel, they even use extensions without prefix:

omp_target_{host,shared,device}_mem_{space,alloc}

and contrary to LLVM they document it with the semantic, cf.
https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/openmp-memory-spaces-and-allocators.html

* * *


The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.


...


diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index cdedc7d80e9..18e3f525ec6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)


...


   #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-_Static_assert (ARRAY_SIZE (predefined_alloc_mapping)
+_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping)
== omp_max_predefined_alloc + 1,
-   "predefined_alloc_mapping must match omp_memspace_handle_t");
+   "predefined_omp_alloc_mapping must match 
omp_memspace_handle_t");
+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))


I am surprised that this compiles: Why do you re-#define this macro?

* * *


--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
 omp_cgroup_mem_alloc = 6,
 omp_pteam_mem_alloc = 7,
 omp_thread_mem_alloc = 8,
+  ompx_pinned_mem_alloc = 100,


See remark regarding "100" at the top of this email.


--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
+integer (kind=omp_allocator_handle_kind), &
+ parameter :: ompx_pinned_mem_alloc = 100


Likewise.

* * *

Why didn't you also update omp_lib.h.in?

* * *

I think you really want to update the checking code inside GCC itself,

i.e. for Fortran:

3 |   !$omp allocate(a) allocator(100)

  | 21

Error: Predefined allocator required in ALLOCATOR clause at (1) as the list 
item 'a' at (2) has the 

[wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-06 Thread Tobias Burnus

GCC 15 now supports unified-shared memory and the tile/unroll constructs
in OpenMP.

Updates https://gcc.gnu.org/gcc-15/changes.html
and https://gcc.gnu.org/projects/gomp/

Comments?

Tobias
gcc-15/changes.html + projects/gomp: update for new OpenMP features

GCC 15 now supports unified-shared memory and the tile/unroll constructs
in OpenMP.

 htdocs/gcc-15/changes.html  | 27 ++-
 htdocs/projects/gomp/index.html | 11 +++
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index b59fd3be..94528ebd 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -40,6 +40,24 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
+
+  https://gcc.gnu.org/projects/gomp/;>OpenMP
+  
+
+  Support for unified-shared memory has been added for some AMD and Nvidia
+  GPUs devices, enabled only when using the
+  unified_shared_memory clause to the requires
+  directive. For details, see the offload-target specifics section in the
+  https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html;
+  >GNU Offloading and Multi Processing Runtime Library Manual.
+
+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+
+  
+
+
 
 
 
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 94bda5ff..d1765fc3 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -313,18 +313,21 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 requires directive
-
+
   GCC9
   GCC12
   GCC13
-  GCC14
+  GCC14
+  GCC15
 
 
   (atomic_default_mem_order)
   (dynamic_allocators)
   complete but no non-host devices provides unified_address or
   unified_shared_memory
-  complete but no non-host devices provides unified_shared_memory
+  complete but no non-host devices provides unified_shared_memory
+  complete; see also https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html;>
+  Offload-Target Specifics
 
   
   
@@ -706,7 +709,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Loop transformation constructs
-No
+GCC15
 
   
   


*ping* – Re: [wwwdocs] gcc-15/changes.html (nvptx): Constructors are now supported

2024-06-05 Thread Tobias Burnus
Regarding 
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653417.html , are 
there any …


Tobias Burnus wrote:

Comments or fine as is?


Tobias



Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-05 Thread Tobias Burnus

Hi Andrew, hello world,

Now with AMD Instinct MI200 data - see below.

And a better look at the numbers. In terms of USM,
there does not seem to be any clear winner of both
approaches. If we want to draw conclusions, definitely
more runs are needed (statistics):

The runs below show that the differences between runs
can be larger than the effect of mapping vs. USM.
And that OG13's USM was be 40% slower on MI210
(compared with mainline or OG13 'map') while
mainline's USM is about as fast as 'map' (OG13 or mainline)
is not consistent with the MI250X result, were both USM are
slower with mainline's USM being much slower with ~30%
than OG13 with 12%.



Tobias Burnus wrote:


I have now tried it on my laptop with 
BabelStream,https://github.com/UoB-HPC/BabelStream

Compiling with:
echo "#pragma omp requires unified_shared_memory" > omp-usm.h
cmake -DMODEL=omp -DCMAKE_CXX_COMPILER=$HOME/projects/gcc-trunk-offload/bin/g++ 
\
   -DCXX_EXTRA_FLAGS="-g -include ../omp-usm.h -foffload=nvptx-none 
-fopenmp" -DOFFLOAD=ON ..

(and the variants: no -include (→ map) + -DOFFLOAD=OFF (= host), and with 
hostfallback,
via env var (or usm-14 by due to lacking support.)

For mainline, I get (either with libgomp.so of mainline or GCC 14, i.e. w/o USM 
support):
host-14.log 195.84user 0.94system 0 11.20elapsed 1755%CPU 
(0avgtext+0avgdata 1583268maxresident)k
host-mainline.log   200.16user 1.00system 0 11.89elapsed 1691%CPU 
(0avgtext+0avgdata 1583272maxresident)k
hostfallback-mainline.log   288.99user 4.57system 0 19.39elapsed 1513%CPU 
(0avgtext+0avgdata 1583972maxresident)k
usm-14.log  279.91user 5.38system 0 19.57elapsed 1457%CPU 
(0avgtext+0avgdata 1590168maxresident)k
map-14.log  4.17user 0.45system 0   03.58elapsed 129%CPU 
(0avgtext+0avgdata 1691152maxresident)k
map-mainline.log    4.15user 0.44system 0   03.58elapsed 128%CPU 
(0avgtext+0avgdata 1691260maxresident)k
usm-mainline.log    3.63user 1.96system 0   03.88elapsed 144%CPU 
(0avgtext+0avgdata 1692068maxresident)k

Thus: GPU is faster than host, host fallback takes 40% longer than doing host 
compilation.
USM is 15% faster than mapping.


Correction: I shouldn't look at user time but at elapsed time. For the 
latter, USM is 8% slower on mainline; hostfallback is ~70% slower than 
host execution.



With OG13, the pattern is similar, except that USM is only 3% faster.
Here, USM (elapsed) is 2.5% faster. It is a bit difficult to compare the 
results as OG13 is faster for mapping and USM, which makes 
distinguishing OG13 vs mainline performance and the two different USM 
approaches difficult.

host-og13.log   191.51user 0.70system 0 09.80elapsed 1960%CPU 
(0avgtext+0avgdata 1583280maxresident)k
map-hostfallback-og13.log   205.12user 1.09system 0 10.82elapsed 1905%CPU 
(0avgtext+0avgdata 1585092maxresident)k
usm-hostfallback-og13.log   338.82user 4.60system 0 19.34elapsed 1775%CPU 
(0avgtext+0avgdata 1584580maxresident)k
map-og13.log4.43user 0.42system 0   03.59elapsed 135%CPU 
(0avgtext+0avgdata 1692692maxresident)k
usm-og13.log4.31user 1.18system 0   03.68elapsed 149%CPU 
(0avgtext+0avgdata 1686256maxresident)k

* * *


As IT issues are now solved:

(A) On  AMD Instinct MI210 (gfx90a)

The host fallback is here very slow with elapsed time 24s vs. 1.6s for host 
execution.
map and USM seem to be in the same ballpark.
For two 'map' runs, I see a difference of 8%, the USM times are between those 
map results.

I see similar results for OG13 than mainline, except for USM which is ~40% 
slower (elapse time)
than map (OG13 or mainline - or mainline's USM).

host-mainline-2.log 194.00user 7.21system 0 01.44elapsed 13954%CPU 
(0avgtext+0avgdata 1320960maxresident)k
host-mainline.log   221.53user 5.58system 0 01.78elapsed 12716%CPU 
(0avgtext+0avgdata 1318912maxresident)k
hostfallback-mainline-1.log 3073.35user 146.22system 0  24.25elapsed 
13272%CPU (0avgtext+0avgdata 1644544maxresident)k
hostfallback-mainline-2.log 2268.62user 146.13system 0  23.39elapsed 
10320%CPU (0avgtext+0avgdata 1650544maxresident)k
map-mainline-1.log  5.38user 16.16system 0  03.00elapsed 716%CPU 
(0avgtext+0avgdata 1714936maxresident)k
map-mainline-2.log  5.12user 15.93system 0  02.74elapsed 768%CPU 
(0avgtext+0avgdata 1714932maxresident)k
usm-mainline-1.log  7.61user 2.30system 0   02.89elapsed 342%CPU 
(0avgtext+0avgdata 1716984maxresident)k
usm-mainline-2.log  7.75user 2.92system 0   02.89elapsed 369%CPU 
(0avgtext+0avgdata 1716980maxresident)k

host-og13-1.log 213.69user 6.37system 0 01.56elapsed 14026%CPU 
(0avgtext+0avgdata 1316864maxresident)k
hostfallback-map-og13-1.log 3026.68user 123.77system 0  23.69elapsed 
13295%CPU (0avgtext+0avgdata 1642496maxresident)k
hostfallback-map-og1

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-04 Thread Tobias Burnus

Andrew Stubbs wrote:


PS: I would love to do some comparisons [...]

Actually, I think testing only data transfer is fine for this, but we
might like to try some different access patterns, besides straight
linear copies.


I have now tried it on my laptop with 
BabelStream,https://github.com/UoB-HPC/BabelStream

Compiling with:
echo "#pragma omp requires unified_shared_memory" > omp-usm.h
cmake -DMODEL=omp -DCMAKE_CXX_COMPILER=$HOME/projects/gcc-trunk-offload/bin/g++ 
\
  -DCXX_EXTRA_FLAGS="-g -include ../omp-usm.h -foffload=nvptx-none 
-fopenmp" -DOFFLOAD=ON ..

(and the variants: no -include (→ map) + -DOFFLOAD=OFF (= host), and with 
hostfallback,
via env var (or usm-14 by due to lacking support.)

For mainline, I get (either with libgomp.so of mainline or GCC 14, i.e. w/o USM 
support):

host-14.log 195.84user 0.94system 0 11.20elapsed 1755%CPU 
(0avgtext+0avgdata 1583268maxresident)k
host-mainline.log   200.16user 1.00system 0 11.89elapsed 1691%CPU 
(0avgtext+0avgdata 1583272maxresident)k
hostfallback-mainline.log   288.99user 4.57system 0 19.39elapsed 1513%CPU 
(0avgtext+0avgdata 1583972maxresident)k
usm-14.log  279.91user 5.38system 0 19.57elapsed 1457%CPU 
(0avgtext+0avgdata 1590168maxresident)k
map-14.log  4.17user 0.45system 0   03.58elapsed 129%CPU 
(0avgtext+0avgdata 1691152maxresident)k
map-mainline.log    4.15user 0.44system 0   03.58elapsed 128%CPU 
(0avgtext+0avgdata 1691260maxresident)k
usm-mainline.log    3.63user 1.96system 0   03.88elapsed 144%CPU 
(0avgtext+0avgdata 1692068maxresident)k

Thus: GPU is faster than host, host fallback takes 40% longer than doing host 
compilation.
USM is 15% faster than mapping.


With OG13, the pattern is similar, except that USM is only 3% faster. Thus, HMM 
seems to win my my laptop.

host-og13.log   191.51user 0.70system 0 09.80elapsed 1960%CPU 
(0avgtext+0avgdata 1583280maxresident)k
map-hostfallback-og13.log   205.12user 1.09system 0 10.82elapsed 1905%CPU 
(0avgtext+0avgdata 1585092maxresident)k
usm-hostfallback-og13.log   338.82user 4.60system 0 19.34elapsed 1775%CPU 
(0avgtext+0avgdata 1584580maxresident)k
map-og13.log4.43user 0.42system 0   03.59elapsed 135%CPU 
(0avgtext+0avgdata 1692692maxresident)k
usm-og13.log4.31user 1.18system 0   03.68elapsed 149%CPU 
(0avgtext+0avgdata 1686256maxresident)k

* * *

I planned to try an AMD Instinct MI200 device, but due to two IT issues, I 
cannot.
(Shutdown for maintenance of the MI250X system and an NFS issues for the MI210 
run,
but being unable to reboot due to the absence of a colleague having tons of 
editors
still open).

Tobias


Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Tobias Burnus

Andrew Stubbs wrote:

On 03/06/2024 17:46, Tobias Burnus wrote:

Andrew Stubbs wrote:

+    /* If USM has been requested and is supported by all devices
+   of this type, set the capability accordingly. */
+    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+  current_device.capabilities |= GOMP_OFFLOAD_CAP_SHARED_MEM;
+


This breaks my USM patches that add the omp_alloc support (because 
it now short-circuits all of those code-paths),


which I believe is fine. Your USM patches are for pseudo-USM, i.e. a 
(useful) bandaid for systems where the memory is not truely 
unified-shared memory but only specially tagged host memory is device 
accessible. (e.g. only memory allocated via cuMemAllocManaged) — And, 
quite similar, for -foffload-memory=pinned.


Er, no.

The default do-nothing USM uses slow uncachable PCI memory accesses 
(on devices that don't have truly shared memory, like APUs).


I have no idea what a "default do nothing USM" is – and using the PCI-E 
to transfer the data is the only option unless there is either a common 
memory controller or some other interconnect Infinity Fabric interconnect).


However, your description sounds as if you talk about pinned memory – 
which by construction cannot migrate – and not about managed memory, 
which is one of the main approaches for USM – especially as that's how 
HMM works and as it avoids to transfer any memory access.


If you use a Linux kernel with HMM and have support for it, the default 
is that upon device access, the page migrates to the GPU (using, e.g. 
PCI-E) and then stays there until the host accesses that memory page 
again, triggering a page fault and transfer back. That's the whole idea 
of HMM and works similar to the migrate to disk feature (aka swapping), 
cf. https://docs.kernel.org/mm/hmm.html


That's the very same behavior as with hipMallocManaged with XNACK 
enabled according to 
https://rocm.docs.amd.com/en/develop/conceptual/gpu-memory.html


As PowerPC + Volta (+ normal kernel) does not support USM but a system 
with + Nvlink does, I bet that on such a system, the memory stays on the 
host and Nvlink does the remote access, but I don't know how Nvlink 
handles caching. (The feature flags state that direct host-memory access 
from the device is possible.)


By contrast, for my laptop GPU (Nvidia RTX A1000) with open kernel 
drivers + CUDA drivers, I bet the memory migration will happen – 
especially as the feature flags direct host-memory access is not possible.


* * *

If host and device access data on the same memory page, page migration 
forth and back will happen continuously, which is very slow.


Also slow is if data is spread over many pages as one gets keeps getting 
page faults until the data is finally completely migrated. The solution 
in that case is a large page such that the data is transferred in 
one/few large chunks.


In general using manual allocation (x = omp_alloc(...)) with a suitable 
allocator can manually avoid the problem by using pinning or large pages 
or … Without knowing the algorithm it is hard to have a generic solution.


If there such a concurrent access issue occurs for compiler generated 
code or with the run-time library, we should definitely try to fix it; 
for user code, it is probably hopeless in the generic case.


* * *

I actually tried to find an OpenMP target-offload benchmark, possibly 
for USM, but I failed. Most seem to be either not available or seriously 
broken – when testing starts by fixing OpenMP syntax bugs, it does not 
increase the trust in the testcase. — Can you suggest a testcase?


* * *

The CUDA Managed Memory and AMD Coarse Grained memory implementation 
uses proper page migration and permits full-speed memory access on the 
device (just don't thrash the pages too fast).


As written, in my understanding that is what happens with HMM kernel 
support for any memory that is not explicitly pinned. The only extra 
trick an implementation can play is pinning the page – such that it 
knows that the memory host does not change (e.g. won't migrates to the 
other NUMA memory of the CPU or to swap space) such that the memory can 
be directly accessed.


I am pretty sure that's the reason, e.g., CUDA pinned memory is faster – 
and it might also help with HMM migration if the destination is known 
not to change; no idea whether the managed memory routines play such 
tricks or not.


Another optimization opportunity exists if it is known that the memory 
won't be accessed by host until the kernel ends, but I don't see this 
guaranteed in general in user code.


* * *

On AMD MI200, your check broken my USM testcases (because the code 
they were testing isn't active).  This is a serious performance problem.


"I need more data." — First, a valid USM testcase should not be broken 
in the mainline. Secondly, I don't see how a generic testcase can have a 
performance issue when USM works. And, I didn't see a tes

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Tobias Burnus

Andrew Stubbs wrote:

+    /* If USM has been requested and is supported by all devices
+   of this type, set the capability accordingly.  */
+    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+  current_device.capabilities |= GOMP_OFFLOAD_CAP_SHARED_MEM;
+


This breaks my USM patches that add the omp_alloc support (because it 
now short-circuits all of those code-paths),


which I believe is fine. Your USM patches are for pseudo-USM, i.e. a 
(useful) bandaid for systems where the memory is not truely 
unified-shared memory but only specially tagged host memory is device 
accessible. (e.g. only memory allocated via cuMemAllocManaged) — And, 
quite similar, for -foffload-memory=pinned.


I think if a user wants to have pseudo USM – and does so by passing 
-foffload-memory=unified – we can add another flag to the internal 
omp_requires_mask. - By passing this option, a user should then also be 
aware of all the unavoidable special-case issues of pseudo-USM and 
cannot complain if they run into those.


If not, well, then the user either gets true USM (if supported) - or 
host fallback. Either of it is perfectly fine.


With -foffload-memory=unified, the compiler can then add all the 
omp_alloc calls – and, e.g., set a new GOMP_REQUIRES_OFFLOAD_MANAGED 
flag. If that's set, we wouldn't do the line above quoted capability 
setting in libgomp/target.c.


For nvidia, GOMP_REQUIRES_OFFLOAD_MANAGED probably requires 
CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS, i.e. when 0 then we 
probably want to return -1 also for -foffload-memory=unified. - A quick 
check shows that Tesla K20 (Kepler, sm_35) has 0 while Volta, Ada, 
Ampere (sm_70, sm_82, sm_89) have 1. (I recall using managed memory on 
an old system; page migration to the device worked fine, but a on-host 
accesses while the kernel was still running, crashed the program.|)

|

For amdgcn, my impression is that we don't need to handle 
-foffload-memory=unified as only the MI200 series (+ APUs) supports this 
well, but MI200 also supports true USM (with page migration; for APU it 
makes even less sense). - But, of course, we still may. — Auto-setting 
HSA_XNACK could be still be done MI200, but I wonder how to distinguish 
MI300X vs. MI300A, but it probably doesn't harm (nor help) to set 
HSA_XNACK for APUs …



and it's just not true for devices where all host memory isn't 
magically addressable on the device.

Is there another way to detect truly shared memory?


Do you have any indication that the current checks become true when the 
memory is not accessible?


Tobias


[committed] install.texi (gcn): Fix date of recommended newlib version

2024-06-03 Thread Tobias Burnus

Somehow, I was one year ahead. The commit wasn't 2025-03-25 but in 2024.

Committed as obvious, also to avoid future confusions.

Tobias
commit 16fb3abf0fb4b88ee0e27732db217909fa429a81
Author: Tobias Burnus 
Date:   Mon Jun 3 12:56:39 2024 +0200

install.texi (gcn): Fix date of recommended newlib version

gcc/ChangeLog:

* doc/install.texi (gcn): Fix date of recommended newlib version.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 42b462a2ce2..c781646ac1f 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3950,7 +3950,7 @@ by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
 Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
-7dd4eb1db (2025-03-25, post-4.4.0) fixes device console output for GFX10 and
+7dd4eb1db (2024-03-25, post-4.4.0) fixes device console output for GFX10 and
 GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the


Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30

2024-06-03 Thread Tobias Burnus

Richard Biener wrote:

install.texi also has the issue that it's not pre-packaged in a
easy to discover and readable file in the release tarballs and that
the online version is only for trunk.


I always wondered why it is not included at 
https://gcc.gnu.org/onlinedocs/ — it would then also be linked from, 
e.g., https://gcc.gnu.org/gcc-14/index.html


Tobias



Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30

2024-06-03 Thread Tobias Burnus

Richard Biener wrote:

On Mon, 3 Jun 2024, Tobias Burnus wrote:

Thomas Schwinge wrote:

In the following, I have then reconsidered that stance; we may actually
"Implement global constructor, destructor support in a conceptually
simpler way than using 'collect2' (the program): implement the respective
functionality in the nvptx-tools 'ld'".  The latter is
<https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
"ld: Global constructor/destructor support".

The attached patch makes clearer which version should be
installed by recommending this patch (= latest nvptx-tools)
in install.texi.

Can we simply say "newerst" where I guess refering to a github repo
already implies this?


Good question. The problem I see with just referring to a repository 
(even with newest) often means: yes, that software I have (whatever 
version). While if some reference goes to a 2024 version, I might not 
know what version I have but likely an older version → I will update.


Admittedly, as people tend to *not* read the documentation, this 
approach might fail as well. But, maybe, it is sufficient to update GCC 
15's release notes?*


It won't help those not reading with the release notes before building 
and the wording* had to be changed a bit as install.texi no longer 
states what version should be used, but it would be an alternative


(*) https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653417.html

Tobias



[wwwdocs] gcc-15/changes.html (nvptx): Constructors are now supported

2024-06-03 Thread Tobias Burnus

Comments or fine as is?

Tobias
gcc-15/changes.html (nvptx): Constructors are now supported

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index b59fd3be..b3305079 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -85,7 +103,14 @@ a work-in-progress.
 
 
 
-
+NVPTX
+
+
+  GCC's nvptx target now supports constructors and destructors;
+  for this, a recent version of nvptx-tools is https://gcc.gnu.org/install/specific.html#nvptx-x-none;
+  >required.
+
 
 
 



[nvptx] *ping* - [patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-06-03 Thread Tobias Burnus

Hi Thomas, hi Tom,

any comment regarding this patch?
 https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650007.html

Tobias

Am 25.04.24 um 12:51 schrieb Tobias Burnus:

Motivated by a surprise of a colleague that with -m32,
no offload dumps were created; that's because mkoffload
does not process host binaries when the are 32bit (i.e. ilp32).

Internally, that done as follows: The host compiler passes to
'mkoffload' the used host ABI, i.e. -foffload-abi=ilp32 or -foffload-abi=lp64

That's done via TARGET_OFFLOAD_OPTIONS, which is supported by aarch64, i386, 
and rs6000.

While it is sensible (albeit not strictly required) that GCC requires that
the host and device side agree and that only 64bit is implemented for the
device side, it can be confusing that silently no offloading code is generated.


Hence, I propose to print a warning in that case - as implemented in the 
attached patch:

$ gcc -fopenmp -m32 test.c
nvptx mkoffload: warning: offload code generation skipped: offloading with 
32-bit host code is currently not supported
gcn mkoffload: warning: offload code generation skipped: offloading with 32-bit 
host code is currently not supported

* * *

This shouldn't have any effect on offload builds using -m64
and non-offload builds – while several testcases already have
issues with '-m32' when offloading is enabled or an offloading
device is available.

To make it not worse, this patch adds some pruning and for
a subset of the failing testcases, I added code to avoids FAILS.
There are some more fails, but those aren't new.

Comments, remarks, suggestions?
Is the mkoffload.cc part is okay?

Tobias



[patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 (was: Re: nvptx target: Global constructor, destructor support, via nvptx-tools 'ld')

2024-06-03 Thread Tobias Burnus

Thomas Schwinge wrote:

In the following, I have then reconsidered that stance; we may actually
"Implement global constructor, destructor support in a conceptually
simpler way than using 'collect2' (the program): implement the respective
functionality in the nvptx-tools 'ld'".  The latter is

"ld: Global constructor/destructor support".


The attached patch makes clearer which version should be
installed by recommending this patch (= latest nvptx-tools)
in install.texi.

OK? Comments, remarks?

Tobias

PS: If the https://github.com/SourceryTools/nvptx-tools/pull/47
(nvptx-ld.cc: Improve C++11 compatibility with older compilers)
proofs worthwhile and gets merged, we should point to that commit
instead.install.texi (nvptx): Recommend nvptx-tools 2024-05-30

gcc/
	* doc/install.texi (nvptx): Recommend nvptx-tools 2024-05-30 or newer.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 42b462a2ce2..4859f6743ab 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4698,7 +4698,8 @@ Andes NDS32 target in big endian mode.
 Nvidia PTX target.
 
 Instead of GNU binutils, you will need to install
-@uref{https://github.com/SourceryTools/nvptx-tools,,nvptx-tools}.
+@uref{https://github.com/SourceryTools/nvptx-tools,,nvptx-tools}
+(recommended: 96f8fc5 of 2024-05-30 -- or newer).
 Tell GCC where to find it:
 @option{--with-build-time-tools=[install-nvptx-tools]/nvptx-none/bin}.
 


Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-05-31 Thread Tobias Burnus

Hi Sandra,

some observations/comments, but in general it looks good.

Sandra Loosemore wrote:

This patch adds the OMP_METADIRECTIVE tree node and shared tree-level
support for manipulating metadirectives.  It defines/exposes
interfaces that will be used in subsequent patches that add front-end
and middle-end support, but nothing generates these nodes yet.

This patch also adds compile-time support for dynamic context
selectors (the target_device selector set and the condition selector
of the user selector set) for metadirectives only.  The "declare
variant" directive still supports only static selectors.

...

  /* Return 1 if context selector matches the current OpenMP context, 0
 if it does not and -1 if it is unknown and need to be determined later.
 Some properties can be checked right away during parsing (this routine),
 others need to wait until the whole TU is parsed, others need to wait until
-   IPA, others until vectorization.  */
+   IPA, others until vectorization.
+
+   METADIRECTIVE_P is true if this is a metadirective context, and DELAY_P
+   is true if it's too early in compilation to determine whether some
+   properties match.
+
+   Dynamic properties (which are evaluated at run-time) should always
+   return 1.  */

I have to admit that I don't really see the use of metadirective_p as …

  int
-omp_context_selector_matches (tree ctx)
+omp_context_selector_matches (tree ctx, bool metadirective_p, bool delay_p)

...

+   if (metadirective_p && delay_p)
+ return -1;


I do see why the resolution of KIND/ARCH/ISA should be delayed – for 
both variant/metadirective as long as the code is run by the host and 
the device. Except that we could exclude, e.g., 'kind(FPGA)' early on as 
we don't support it at all.


But once the device code is split off, I don't see why we can't expand 
the DEVICE clause right away for both variant and metadirective – while 
for 'target_device', we cannot do much until runtime – except of 
excluding things like 'kind(fpga)' – or excluding all 'arch' known not 
to be supported neither by the host nor by any enabled offload devices.


Thus, I see why there is a 'delay_p', but not why there is a 
'metadirective_p'.


But I might have missed something important ...


 case OMP_TRAIT_USER_CONDITION:
   if (set == OMP_TRAIT_SET_USER)
 for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
   if (OMP_TP_NAME (p) == NULL_TREE)
 {
+ /* OpenMP 5.1 allows non-constant conditions for
+metadirectives.  */
+ if (metadirective_p
+ && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+   break;
   if (integer_zerop (OMP_TP_VALUE (p)))
 return 0;
   if (integer_nonzerop (OMP_TP_VALUE (p)))
 break;
   ret = -1;
 }


(BTW: I am happy to be enlightened as I likely have miss some fine print.)

Regarding the comment: True, but shouldn't this be handled before by 
issuing an error when such a clause is used in 'declare variant', i.e. 
only occur when metadirective_p is/can be true?


Besides, I have to admit that I do not understand the new code. The 
current code has: constant zero → whole selector known to be false 
("return 0"); nonzero constant → keep current state, i.e. either 'true' 
(1) or don't known ('-1') and continue; otherwise (not const) → set to 
"don't know" (-1) and continue with the next item.


That seems to make also sense for metadirectives. But your patch changes 
this to keep current state if a variable. In that case, '1' is used if 
this is the only item or the previous condition is true. Or "-1" when 
the previous item is "don't know" (-1). - I think that doesn't make 
sense and it should always return -1 for a run time value.


Additionally, I wonder why you use tree_fits_shwi_p instead of a simple 
'TREE_CODE (OMP_TP_VALUE (p)) != INTEGER_CST'. It does not seem to 
matter here, but '(uint128_t)-1' looks like a valid condition and valid 
constant, which integer_nonzerop should handled but if the hwi is 128bit 
wide, it won't fit into a signed variable.


(As integer_nonzerop and the current code both do "break;" it won't 
change the result of the current code.)


* * *

+static tree
+omp_dynamic_cond (tree ctx)
+{

...

+  /* The user condition is not dynamic if it is constant.  */
+  if (!tree_fits_shwi_p (TREE_VALUE (expr_list)))


Any reason for using tree_fits_shwi_p instead of INTEGER_CST? Here, 
(uint128_t)-1 could make a difference …



+   /* omp_initial_device is -1, omp_invalid_device is -4; choose
+  a value that isn't otherwise defined to indicate the default
+  device.  */
+   device_num = build_int_cst (integer_type_node, -2);


Don't do this - we do it differently 

[patch] libgomp.texi: Impl. update for USM and missing 5.2 item

2024-05-29 Thread Tobias Burnus
Now that unified-shared memory works (with some devices), mark it as 'Y' 
and link to the device-specific chapter. While there is always room for 
improvement (like having opt-in partial support for managed-memory 
semi-USM devices), it works sufficienty for a 'Y'.


Additionally, I saw that 5.2 now extended what is permitted inside 
'declare mapper'. Instead of listening the permitted clauses as in 5.1, 
it now refers to the 'map' clause such that 'delete'/'release', 
'present' and in particular 'iterator' and 'mapper' itself are permitted 
inside a declare-mapper 'map' clause. - Thus, I added it as to-do item 
to the 5.2 status.


Comments?

Tobias

PS: As this is also about USM, the declare-target USM issue I mentioned 
in several patch emails is now filed as https://gcc.gnu.org/PR115279libgomp.texi: Impl. update for USM and missing 5.2 item

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.0 status): Mark 'requires' as done and
	link to 'Offload-Target Specifics'.
	(OpenMP 5.2 status): Add item about additional map-type modifiers
	in 'declare mapper'.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e79bd7a3392..03e6455219d 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -198,8 +198,8 @@ The OpenMP 4.5 specification is fully supported.
 @item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
   env variable @tab Y @tab
 @item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
-@item @code{requires} directive @tab P
-  @tab complete but no non-host device provides @code{unified_shared_memory}
+@item @code{requires} directive @tab Y
+  @tab See @ref{Offload-Target Specifics}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
 @item Non-rectangular loop nests @tab P
   @tab Full support for C/C++, partial for Fortran
@@ -443,6 +443,8 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
   of the @code{interop} construct @tab N @tab
 @item Invoke virtual member functions of C++ objects created on the host device
   on other devices @tab N @tab
+@item @code{iterator} and @code{mapper} as map-type modifier in @code{declare mappter}
+  @tab N @tab
 @end multitable
 
 


[patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-29 Thread Tobias Burnus

This patch depends (on the libgomp/target.c parts) of the patch
"[patch] libgomp: Enable USM for some nvptx devices",
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html

AMD GPUs that are either APU devices or MI200 [or MI300X]
(with HSA_XNACK=1 set) can access host memory; the run-time library
returns in that case HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = true.

Thus, it makes sense to enable USM support for those devices, which
this patch does. — A simple test with all unified_shared_memory tests
shipping with sollve_vv now works:*

  Test passed on the device.

as tested on an MI200 series device. In line with (some) other compilers,
it requires that HSA_XNACK=1 is set, otherwise the code will be executed
on the host.

(* Well, for C++, -O2 -fno-exception was used but stillonly 5 test case PASS, 1 delete[] etc. link error 1 ICE (segfault during 
IPA pass: cpin gcn gcc) 1 runtime fail for 
tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp [**] but 
all 15 Fortran and 16 C tests PASS.)


Comments, remarks, suggestions?
Any reason not to commit it to mainline?

Tobias

PS: Richard confirmed that his gfx1036 APU also has
HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT == true; at least when
he disables the discrete gfx1030, which neither supports xnack not
is an APU.

** rocgdb shows:

Thread 4 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to thread 4, lane 0 (AMDGPU Lane 1:1:1:1/0 (0,0,0)[0,0,0])]
0x77309c30 in main._omp_fn () at 
tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp:88
88if (Emp.name[i] != RefStr[i]) {

but I have not tried to debug this.
libgomp: Enable USM for AMD APUs and MI200 devices

If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true,
all GPUs on the system support unified shared memory. That's
the case for APUs and MI200 devices when XNACK is enabled.

XNACK can be enabled by setting HSA_XNACK=1 as env var for
supported devices; otherwise, if disable, USM code will
use host fallback.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (gcn_local_sym_hash): Fix typo.

include/ChangeLog:

	* hsa.h (HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): Add
	enum value.

libgomp/ChangeLog:

	* libgomp.texi (gcn): Update USM handling
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Handle
	USM if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true.

 gcc/config/gcn/gcn-hsa.h|  2 +-
 include/hsa.h   |  4 +++-
 libgomp/libgomp.texi|  9 +++--
 libgomp/plugin/plugin-gcn.c | 18 ++
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 4611bc55392..03220555075 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -80,7 +80,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
writes a new AMD GPU object file and the ABI version needs to be the
same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5.
GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.
-   Note that Fiji is only suppored with LLVM <= 17 as version 3 is no longer
+   Note that Fiji is only supported with LLVM <= 17 as version 3 is no longer
supported in LLVM >= 18.  */
 #define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
 			 "!march=*|march=*:--amdhsa-code-object-version=4"
diff --git a/include/hsa.h b/include/hsa.h
index f9b5d9daf85..3c7be95d7fd 100644
--- a/include/hsa.h
+++ b/include/hsa.h
@@ -466,7 +466,9 @@ typedef enum {
   /**
   * String containing the ROCr build identifier.
   */
-  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200
+  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200,
+
+  HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202
 } hsa_system_info_t;
 
 /**
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 22868635230..e79bd7a3392 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6360,8 +6360,13 @@ The implementation remark:
   such that the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any GCN device from the list of
-  available devices (``host fallback'').
+  @code{unified_shared_memory} is only supported if all AMD GPUs have the
+  @code{HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT} property; for
+  discrete GPUs, this may require setting the @code{HSA_XNACK} environment
+  variable to @samp{1}; for systems with both an APU and a discrete GPU that
+  does not support XNACK, consider using @code{ROCR_VISIBLE_DEVICES} to
+  enable only the APU.  If not supported, all AMD GPU devices are removed
+  from the list of available devices (``host fallback'').
 @item The available stack size can be changed using the @code{GCN_STACK_SIZE}
   environment variable; the default is 32 kiB per thread.
 @item Low-latency memory 

Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-29 Thread Tobias Burnus

Jakub Jelinek wrote:

I mean, if we want to add something, maybe better would an -include like
option that instead of including a file includes it directly.
gcc --include-inline '#pragma omp requires unified_shared_memory' ...


Likewise for Fortran, but there the question is whether it should be in 
the use-stmt, import-stmt, implicit-part or declaration-part; I guess 
having one --include-inline-use-stmt and --include-inline-declaration 
would make sense …


And, I guess, multiple flags should be permitted, which can then be 
processed as separate lines.


Tobias


Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-29 Thread Tobias Burnus

Jakub Jelinek wrote:

How is that option different from
echo '#pragma omp requires unified_shared_memory' > omp-usm.h
gcc -include omp-usm.h
?
I mean with -include you can add anything you want, not just one particular
directive, and adding a separate option for each is just weird.


For C/C++, -include seems to be indeed sufficient (albeit not widely 
known). For Fortran, there at two issues: One placement/semantic issue: 
it has to be added per "compilation unit", i.e. to the specification 
part of a module, subprogram or main program. And a practical issue, 
gfortran shows:


error: command-line option '-include !$omp requires' is valid for 
C/C++/ObjC/ObjC++ but not for Fortran


Thus, for Fortran it is still intrinsically useful – even if one can 
argue whether that feature is needed at all / whether it should be added 
as command-line argument.


Tobias


Re: [patch] libgomp: Enable USM for some nvptx devices

2024-05-29 Thread Tobias Burnus

Tobias Burnus wrote:
While most of the nvptx systems I have access to don't have the 
support for 
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, one 
has:


Actually, CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS is sufficient. And 
I finally also found the proper webpage for this feature; I couldn't 
find it as Nvidia's documentation uses pageableMemoryAccess and not 
CU_... for that feature. The updated patch is attached.


For details: 
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements


In principle, this proper USM is supported by Grace Hopper, PowerPC9 + 
Volta (sm_70) – but for some reasons, our PPC/Volta system does not 
support it. It is also said to work with Turing (sm_75) and newer when 
using Linux Kernel's HMM and the Open Kernel Modules (newer CUDA have 
this but don't use them by default). See link above.


I am not quite sure whether there are unintended side effects, hence, 
I have not enabled support for it in general. In particular, 'declare 
target enter(global_var)' seems to be mishandled (I think it should be 
link + pointer updated to point to the host; cf. description for 
'self_maps'). Thus, it is not enabled by default but only when USM has 
been requested.

OK for mainline?
Comments? Remarks? Suggestions?

Tobias

PS: I guess some more USM tests should be added…
libgomp: Enable USM for some nvptx devices

A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared
memory is supported in hardware. This patch enables support for those -
if all installed nvptx devices have this feature (as the capabilities
are per device type).

This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.

include/ChangeLog:

	* cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add.

libgomp/ChangeLog:

	* libgomp.texi (nvptx): Update USM description.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
	Claim support when requesting USM and all devices support 
	CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS.
	* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
	(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
	devices supports USM.

 include/cuda/cuda.h   |  3 ++-
 libgomp/libgomp.texi  |  7 +--
 libgomp/plugin/plugin-nvptx.c | 16 
 libgomp/target.c  | 24 +++-
 4 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 0dca4b3a5c0..804d08ca57e 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -83,7 +83,8 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
-  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS = 88
 } CUdevice_attribute;
 
 enum {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 71d62105a20..ba534b6b3c4 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6435,8 +6435,11 @@ The implementation remark:
   the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any nvptx device from the
-  list of available devices (``host fallback'').
+  @code{unified_shared_memory} will run on nvptx devices if and only if
+  all of those support the @code{pageableMemoryAccess} property;@footnote{
+  @uref{https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements}}
+  otherwise, all nvptx device are removed from the list of available
+  devices (``host fallback'').
 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
   in the GCC manual.
 @item The OpenMP routines @code{omp_target_memcpy_rect} and
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5aad3448a8d..d3764185d4b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1201,8 +1201,24 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (num_devices > 0
   && ((omp_requires_mask
 	   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+	   | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
 	   | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
 return -1;
+  /* Check whether host page access (direct or via migration) is supported;
+ if so, enable USM.  Currently, capa

[patch] libgomp: Enable USM for some nvptx devices

2024-05-28 Thread Tobias Burnus
While most of the nvptx systems I have access to don't have the support 
for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, 
one has:


Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support 
this feature. And with that feature, unified-shared memory support does 
work, presumably by handling automatic page migration when a page fault 
occurs.


Hence: Enable USM support for those. When doing so, all 'requires 
unified_shared_memory' tests of sollve_vv pass :-)


I am not quite sure whether there are unintended side effects, hence, I 
have not enabled support for it in general. In particular, 'declare 
target enter(global_var)' seems to be mishandled (I think it should be 
link + pointer updated to point to the host; cf. description for 
'self_maps'). Thus, it is not enabled by default but only when USM has 
been requested.


OK for mainline?
Comments? Remarks? Suggestions?

Tobias

PS: I guess some more USM tests should be added…

libgomp: Enable USM for some nvptx devices

A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES;
for those, unified shared memory is supported in hardware. This
patch enables support for those - if all installed nvptx devices
have this feature (as the capabilities are per device type).

This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.

include/ChangeLog:

	* cuda/cuda.h
	(CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES):
	Add.

libgomp/ChangeLog:

	* libgomp.texi (nvptx): Update USM description.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
	Claim support when requesting USM and all devices support 
	CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES.
	* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
	(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
	devices supports USM.

 include/cuda/cuda.h   |  3 ++-
 libgomp/libgomp.texi  |  5 -
 libgomp/plugin/plugin-nvptx.c | 15 +++
 libgomp/target.c  | 24 +++-
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 0dca4b3a5c0..db640d20366 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -83,7 +83,8 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
-  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES = 100
 } CUdevice_attribute;
 
 enum {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 71d62105a20..e0d37f67983 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6435,7 +6435,10 @@ The implementation remark:
   the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any nvptx device from the
+  @code{unified_shared_memory} will run on nvptx devices if and only if
+  all of those support the
+  @code{CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES}
+  attribute; otherwise, all nvptx device are removed from the
   list of available devices (``host fallback'').
 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
   in the GCC manual.
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5aad3448a8d..c4b0f5dd4bf 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1201,8 +1201,23 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (num_devices > 0
   && ((omp_requires_mask
 	   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+	   | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
 	   | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
 return -1;
+  /* Check whether automatic page migration is supported; if so, enable USM.
+ Currently, capabilities is per device type, hence, check all devices.  */
+  if (num_devices > 0
+  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+for (int dev = 0; dev < num_devices; dev++)
+  {
+	int pi;
+	CUresult r;
+	r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, ,
+	  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES,
+	  dev);
+	if (r != CUDA_SUCCESS || pi == 0)
+	  return -1;
+  }
   return num_devices;
 }
 
diff --git a/libgomp/target.c 

[patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Tobias Burnus
-fopenmp-force-usm can be useful for some badly written code. Explicity 
using 'omp requires' makes more sense but still. It might also make 
sense for testing purpose.


Unfortunately, I did not see a simple way of testing it. When trying it 
manually, I looked at the 'a.xamdgcn-amdhsa.c' -save-temps file, where 
gcn_data has the omp_requires_mask as second argument and testing showed 
that an explicit pragma and the -f... argument have the same result.


Alternative would be to move this code later, e.g. to lto-cgraph.cc's 
omp_requires_mask, which might be safer (as it avoids changing as many 
locations). On the other hand, it might require more special cases 
elsewhere.*


Comment, suggestions?

Tobias

*I am especially thinking about a global variable and "#pragma omp 
declare target". At least with 'omp requires self_maps' of OpenMP 6, it 
seems as if 'declare target enter(global_var)' should become 
'link(global_var)' where the global_var pointer is updated to point to 
the host version.


At least I don't see how otherwise the "all corresponding list items 
created by the 'enter' clauses specified by declare target directives in 
the compilation unit share storage with the original list items." could 
be fulfilled.


This will require generating different code for 'self_maps' (and, 
potentially / [RFC] 'unified_shared_memory') than normal code, which 
would be the first compiler code-gen change due to USM (→ 
GOMP_OFFLOAD_CAP_SHARED_MEM) for non-host devices.
OpenMP: Add -fopenmp-force-usm mode

Add an implicit 'omp requires unified_shared_memory' to all files that
use target constructs ("OMP_REQUIRES_TARGET_USED").  As constructed, the
diagnostic "'unified_shared_memory' clause used lexically after first target
construct or offloading API" is not inhibited.

The option has no effect without -fopenmp and does not affect OpenACC code,
matching what the directive would do.  The name of the command-line option
matches Clang's, added in LLVM 18.

gcc/c-family/ChangeLog:

	* c.opt (fopenmp-force-usm): New.
	* c.opt.urls: Regenerated

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_target_data, c_parser_omp_target_update,
	c_parser_omp_target_enter_data, c_parser_omp_target_exit_data,
	c_parser_omp_target): When setting OMP_REQUIRES_TARGET_USED, also
	set OMP_REQUIRES_UNIFIED_SHARED_MEMORY if -fopenmp-force-usm is
	in force.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_target_data,
	cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data,
	cp_parser_omp_target_update, cp_parser_omp_target): When setting
	OMP_REQUIRES_TARGET_USED, also set OMP_REQUIRES_UNIFIED_SHARED_MEMORY
	if -fopenmp-force-usm is in force.


gcc/ChangeLog:

	* doc/invoke.texi (-fopenmp-force-usm): Document new option.

gcc/fortran/ChangeLog:

	* invoke.texi (-fopenmp-force-usm): Document new option.
	* lang.opt (fopenmp-force-usm): New.
	* lang.opt.urls: Regenerate.
	* parse.cc (gfc_parse_file): When setting
	OMP_REQUIRES_TARGET_USED, also set OMP_REQUIRES_UNIFIED_SHARED_MEMORY
	if -fopenmp-force-usm is in force.

 gcc/c-family/c.opt|  4 
 gcc/c-family/c.opt.urls   |  3 +++
 gcc/c/c-parser.cc | 50 +--
 gcc/cp/parser.cc  | 50 +--
 gcc/doc/invoke.texi   | 11 +--
 gcc/fortran/invoke.texi   |  7 +++
 gcc/fortran/lang.opt  |  4 
 gcc/fortran/lang.opt.urls |  3 +++
 gcc/fortran/parse.cc  | 10 --
 9 files changed, 118 insertions(+), 24 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index fb34c3b7031..4985cd61c48 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2136,6 +2136,10 @@ fopenmp
 C ObjC C++ ObjC++ LTO Var(flag_openmp)
 Enable OpenMP (implies -frecursive in Fortran).
 
+fopenmp-force-usm
+C ObjC C++ ObjC++ Var(flag_openmp_force_usm)
+Behave as if the source file contained OpenMP's 'requires unified_shared_memory'.
+
 fopenmp-simd
 C ObjC C++ ObjC++ Var(flag_openmp_simd)
 Enable OpenMP's SIMD directives.
diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index dd455d7c0dc..34b3a395e84 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -1222,6 +1222,9 @@ UrlSuffix(gcc/C-Dialect-Options.html#index-fopenacc-dim)
 fopenmp
 UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp)
 
+fopenmp-force-usm
+UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp-force-usm) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp-force-usm)
+
 fopenmp-simd
 UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp-simd) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp-simd)
 
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 00f8bf4376e..93c9cd1c9d0 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -23849,8 +23849,14 @@ static tree
 c_parser_omp_target_data (location_t loc, c_parser *parser, bool *if_p)
 {
   if 

[Patch] testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

2024-05-28 Thread Tobias Burnus
Improve test coverage by removing 'prune-output' given that the features 
are implemented in the meanwhile.


Comments, suggestions? Otherwise I will commit the patch as obvious.

Tobias
testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/lastprivate-conditional-1.c: Remove
	'{ dg-prune-output "not supported yet" }'.
	* c-c++-common/gomp/requires-1.c: Likewise.
	* c-c++-common/gomp/requires-2.c: Likewise.
	* c-c++-common/gomp/reverse-offload-1.c: Likewise.
	* g++.dg/gomp/requires-1.C: Likewise.
	* gfortran.dg/gomp/requires-1.f90: Likewise.
	* gfortran.dg/gomp/requires-2.f90: Likewise.
	* gfortran.dg/gomp/requires-4.f90: Likewise.
	* gfortran.dg/gomp/requires-5.f90: Likewise.
	* gfortran.dg/gomp/requires-6.f90: Likewise.
	* gfortran.dg/gomp/requires-7.f90: Likewise.

 gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c | 2 --
 gcc/testsuite/c-c++-common/gomp/requires-1.c| 2 --
 gcc/testsuite/c-c++-common/gomp/requires-2.c| 2 --
 gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c | 2 --
 gcc/testsuite/g++.dg/gomp/requires-1.C  | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-1.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-2.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-4.f90   | 1 -
 gcc/testsuite/gfortran.dg/gomp/requires-5.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-6.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-7.f90   | 1 -
 11 files changed, 20 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
index 722aba79a52..d4ef49690e8 100644
--- a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
@@ -63,2 +62,0 @@ bar (int *p)
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-1.c b/gcc/testsuite/c-c++-common/gomp/requires-1.c
index e1f2e3a503f..a47ec659566 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-1.c
@@ -13,2 +12,0 @@ foo ()
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-2.c b/gcc/testsuite/c-c++-common/gomp/requires-2.c
index 717b65caeea..d7430b1b1a4 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-2.c
@@ -9,2 +8,0 @@
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
index 9a3fa5230f8..ddc3c2c6be1 100644
--- a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
@@ -9,2 +8,0 @@
-/* { dg-prune-output "'reverse_offload' clause on 'requires' directive not supported yet" } */
-
diff --git a/gcc/testsuite/g++.dg/gomp/requires-1.C b/gcc/testsuite/g++.dg/gomp/requires-1.C
index aefeb288dad..5ca5e006da1 100644
--- a/gcc/testsuite/g++.dg/gomp/requires-1.C
+++ b/gcc/testsuite/g++.dg/gomp/requires-1.C
@@ -11,2 +10,0 @@ namespace M {
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
index b115a654e71..19007834c45 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
@@ -12,2 +11,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-2.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
index 5f11a7bfb2a..f144d391034 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
@@ -13,2 +12,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-4.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
index c870a2840d3..9d936197f8f 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
@@ -36 +35,0 @@ end
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-5.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
index e719e929294..87be933ba49 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
@@ -15,2 +14,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-6.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
index cabd3d94a90..b20c218dd6b 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
@@ -15,2 +14,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-7.f90 

[wwwdocs][patch] gcc-15/changes.html: Fortran - mention F2023 logical-kind additions

2024-05-28 Thread Tobias Burnus
Let's make https://gcc.gnu.org/gcc-15/changes.html a bit more useful … 
While there were several useful Fortran commits already, only one seems 
to be about a new feature.


Thus, document selected_logical_kind and the ISO_FORTRAN_ENV additions.

Comments or suggestions before I commit it?

Tobias
Title: GCC 15 Release Series — Changes, New Features, and Fixes








GCC 15 Release SeriesChanges, New Features, and Fixes


This page is a "brief" summary of some of the huge number of improvements
in GCC 15.



Note: GCC 15 has not been released yet, so this document is
a work-in-progress.


Caveats

  ...




General Improvements


New Languages and Language specific improvements










Fortran


  Fortran 2023: The selected_logical_kind intrinsic function
  and, in the ISO_FORTRAN_ENV module, the named constants
  logical{8,16,32,64} and real16 were added.








New Targets and Target Specific Improvements








































Operating Systems



























Other significant improvements










Re: [PATCH 6/7] OpenMP: Fortran front-end support for dispatch + adjust_args

2024-05-28 Thread Tobias Burnus

Hi PA, hi all,

two remarks while quickly browsing the code:

Paul-Antoine Arras:

+ if (n->sym->ts.type != BT_DERIVED
+ || !n->sym->ts.u.derived->ts.is_iso_c)
+   {
+ gfc_error ("argument list item %qs in "
+"% at %L must be of "
+"TYPE(C_PTR)",
+n->sym->name, >where);


I think you need to rule out 'c_funptr' as well, e.g. via:

|| (n->sym->ts.u.derived->intmod_sym_id
!= ISOCBINDING_PTR)))

I do note that in openmp.cc, we have one check which checks explicitly 
for c_ptr and one existing one which only checks for (c_ptr or 
c_funptr); can you fix that one as well?


* * *

But I mainly miss an update to 'module.cc' for the 'declare variant' 
change; the 'adjust_args' (for 'need_device_ptr', only) list items have

to be saved in the .mod file - otherwise the following will not work:

-aux.f90
! { dg-do compile { target skip-all-targets } }
module my_mod
  ...
  !$omp declare variant ... adjust_args(need_device_ptr: ...)
  ...
end module

.f90
{ dg-do ...
! { dg-additional-sources -aux.f90 }
  ...
  call 
  ...
  !$omp displatch
   call 
end


For C++ modules, it should be fine as those for those, the tree is dumped.

Tobias


Re: [Patch] Fortran: invoke.texi - link to OpenCoarrays.org + mention libcaf_single

2024-05-21 Thread Tobias Burnus

Hi Bernhard,

rep.dot@gmail.com wrote:

library such as @url{http://opencoarrays.org} needs to be linked.

Maybe use https?


Works, but as the certificate is not valid, it requires to ignore the 
errors in a browser, which is a worse user experience.


The error is, e.g.,

"curl: (60) SSL certificate problem: self-signed certificate"

Or at 
https://www.ssllabs.com/ssltest/analyze.html?d=www.opencoarrays.org=on


"Common names: invalid-sni.invalid / Issuer: invalid-sni.invalid  
(Self-signed)"


@Damian: Can you fix the server to actually have a valid certificate?

Tobias


Re: [Patch] contrib/gcc-changelog/git_update_version.py: Improve diagnostic

2024-05-21 Thread Tobias Burnus

Hi Jakub,

Jakub Jelinek wrote:

On Mon, May 20, 2024 at 08:31:02AM +0200, Tobias Burnus wrote:

Hmm, there were now two daily bumps: [...] I really wonder why.

Because I've done it by hand.


Okay, that explains it.

I still do not understand why it slipped through at the first place; I 
tried old versions down to r12-709-g772e5e82e3114f and it still FAIL for 
the invalid commit ("ERR: cannot find a ChangeLog location in message").


Thus, I wonder whether the commit hook is active at all?!?


I have in ~gccadmin a gcc-changelog copy and adjusted update_version_git
script which doesn't use contrib/gcc-changelog subdirectory from the
checkout it makes but from the ~gccadmin directory,

[...]

I'm already using something similar in
my hack (just was doing it for even successful commits, but I think your
patch is better).
And, I think best would be if update_version_git script simply
accepted a list of ignored commits from the command line too,
passed it to the git_update_version.py script and that one
added those to IGNORED_COMMITS.


Updated version:

* Uses my diagnostic

* Adds an -i/--ignore argument for commits. Permits to use '-i hash1  -i 
hash2' but also '-i hash1,hash2' or '-i "hash1 hash2'


* I changed the global variable to lower case as Python's style guide 
states that all uppercase variables is for constants.


* The '=None' matches one of the current usages (no argument passed); 
hence, it is now explicit and 'pylint' is happy.


OK for mainline?

Tobias

PS: I have not updated the hashes. If needed/wanted, I leave that to 
you, Jakub.
contrib/gcc-changelog/git_update_version.py: Improve diagnostic

contrib/ChangeLog:

	* gcc-changelog/git_update_version.py: Add '-i'/'--ignore' argument
	to add to-be-ignored commits via the command line.
	(ignored_commits): Rename from IGNORED_COMMITS and change
	type from tuple to set.
	(prepend_to_changelog_files): Show git hash if errors occurred.
	(update_current_branch): Mark argument as optional by defaulting
	to None.

 contrib/gcc-changelog/git_update_version.py | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/contrib/gcc-changelog/git_update_version.py b/contrib/gcc-changelog/git_update_version.py
index 24f6c43d0b2..c69a3a6897a 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -22,6 +22,7 @@ import argparse
 import datetime
 import logging
 import os
+import re
 
 from git import Repo
 
@@ -30,7 +31,7 @@ from git_repository import parse_git_revisions
 current_timestamp = datetime.datetime.now().strftime('%Y%m%d\n')
 
 # Skip the following commits, they cannot be correctly processed
-IGNORED_COMMITS = (
+ignored_commits = {
 'c2be82058fb40f3ae891c68d185ff53e07f14f45',
 '04a040d907a83af54e0a98bdba5bfabc0ef4f700',
 '2e96b5f14e4025691b57d2301d71aa6092ed44bc',
@@ -41,7 +42,7 @@ IGNORED_COMMITS = (
 '040e5b0edbca861196d9e2ea2af5e805769c8d5d',
 '8057f9aa1f7e70490064de796d7a8d42d446caf8',
 '109f1b28fc94c93096506e3df0c25e331cef19d0',
-'39f81924d88e3cc197fc3df74204c9b5e01e12f7')
+'39f81924d88e3cc197fc3df74204c9b5e01e12f7'}
 
 FORMAT = '%(asctime)s:%(levelname)s:%(name)s:%(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT,
@@ -58,6 +59,7 @@ def read_timestamp(path):
 
 def prepend_to_changelog_files(repo, folder, git_commit, add_to_git):
 if not git_commit.success:
+logging.info(f"While processing {git_commit.info.hexsha}:")
 for error in git_commit.errors:
 logging.info(error)
 raise AssertionError()
@@ -93,13 +95,15 @@ parser.add_argument('-d', '--dry-mode',
  ' is expected')
 parser.add_argument('-c', '--current', action='store_true',
 help='Modify current branch (--push argument is ignored)')
+parser.add_argument('-i', '--ignore', action='append',
+help='list of commits to ignore')
 args = parser.parse_args()
 
 repo = Repo(args.git_path)
 origin = repo.remotes['origin']
 
 
-def update_current_branch(ref_name):
+def update_current_branch(ref_name=None):
 commit = repo.head.commit
 commit_count = 1
 while commit:
@@ -123,7 +127,7 @@ def update_current_branch(ref_name):
 head = head.parents[1]
 commits = parse_git_revisions(args.git_path, '%s..%s'
   % (commit.hexsha, head.hexsha), ref_name)
-commits = [c for c in commits if c.info.hexsha not in IGNORED_COMMITS]
+commits = [c for c in commits if c.info.hexsha not in ignored_commits]
 for git_commit in reversed(commits):
 prepend_to_changelog_files(repo, args.git_path, git_commit,
not args.dry_mode)
@@ -153,6 +157,9 @@ def update_current_branch(ref_name):
 else:
 logging.info('DATESTAMP unchanged')
 
+if args.ignore is not None:
+  

[Patch] contrib/gcc-changelog/git_update_version.py: Improve diagnostic (was: [Patch] contrib/gcc-changelog/git_update_version.py: Add ignore commit, improve diagnostic)

2024-05-20 Thread Tobias Burnus

Hmm, there were now two daily bumps:

Date:   Mon May 20 00:16:30 2024 +

Date:   Sun May 19 18:15:28 2024 +

I really wonder why.

I guess, the 'ignore commit' is hence not needed – but I think the 
improved diagnostic part still makes sense.


See updated patch.

On May 19, 24 Tobias Burnus wrote:

I noticed that the last bump happened on Thursday.

* * *

The error is according to
https://gcc.gnu.org/pipermail/gccadmin/2024q2/021298.html

2024-05-19 00:17:28,643:INFO:root:cannot find a ChangeLog location in 
message


That's the commit
---
    Revert "Revert: "Enable prange support.""

    This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and 
enables prange

    support again.
---

* * * The attached patch adds this commit to the ignore list and helps 
with the diagnosis by showing the failing hash in the error message.


OK for mainline?

Post commit: Can someone install the new version + fix the ChangeLog 
for the ignored commit?


* * *

What I do not understand: Why does this commit get applied? I do see 
for both


contrib/gcc-changelog/git_check_commit.py -v -p 
da73261ce7731be7f2b164f1db796878cdc23365


and

contrib/gcc-changelog/git_email.py 
0001-Revert-Revert-Enable-prange-support.patch the error above. - And 
I do not understand why it made it past the commit check but now fails?


Likewise for8057f9aa1f7e70490064de796d7a8d42d446caf8

Does the commit hook use an older version of the check scripts? Does 
it ignore the errors? Or what goes wrong here? Any idea?


TobiasFrom f56b1764f2b5c2c83c6852607405e5be0a763a2c Mon Sep 17 00:00:00 2001
From: Tobias Burnus 
Date: Sun, 19 May 2024 08:17:42 +0200
Subject: [PATCH] contrib/gcc-changelog/git_update_version.py: Improve diagnostic

contrib/ChangeLog:

* gcc-changelog/git_update_version.py (prepend_to_changelog_files): Output
	git hash in case errors occurred.

diff --git a/contrib/gcc-changelog/git_update_version.py b/contrib/gcc-changelog/git_update_version.py
index 24f6c43d0b2..ec0151b83fe 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -58,6 +58,7 @@ def read_timestamp(path):
 
 def prepend_to_changelog_files(repo, folder, git_commit, add_to_git):
 if not git_commit.success:
+logging.info(f"While processing {git_commit.info.hexsha}:")
 for error in git_commit.errors:
 logging.info(error)
 raise AssertionError()
-- 
2.45.0



[Patch] Fortran: Fix SHAPE for zero-size arrays

2024-05-19 Thread Tobias Burnus
That is for https://gcc.gnu.org/PR115150 – a GCC 12/13/14/15 regression, 
caused when switching from a libgomp call to inline code and missing the 
corner case of zero-size arrays ...


OK for mainline + all affected branches?

Tobias
Fortran: Fix SHAPE for zero-size arrays

	PR fortran/115150

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (gfc_conv_intrinsic_bound): Fix SHAPE
	for zero-size arrays

gcc/testsuite/ChangeLog:

	* gfortran.dg/shape_12.f90: New test.

 gcc/fortran/trans-intrinsic.cc |  4 ++-
 gcc/testsuite/gfortran.dg/shape_12.f90 | 51 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 80dc3426ab0..912c1000e18 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -3090,7 +3090,9 @@ gfc_conv_intrinsic_bound (gfc_se * se, gfc_expr * expr, enum gfc_isym_id op)
   lbound, gfc_index_one_node);
 	}
   else if (op == GFC_ISYM_SHAPE)
-	se->expr = size;
+	se->expr = fold_build2_loc (input_location, MAX_EXPR,
+gfc_array_index_type, size,
+gfc_index_zero_node);
   else
 	gcc_unreachable ();
 
diff --git a/gcc/testsuite/gfortran.dg/shape_12.f90 b/gcc/testsuite/gfortran.dg/shape_12.f90
new file mode 100644
index 000..e672e1ff9f9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/shape_12.f90
@@ -0,0 +1,51 @@
+! { dg-do run }
+!
+! PR fortran/115150
+!
+! Check that SHAPE handles zero-sized arrays correctly
+!
+implicit none
+call one
+call two
+
+contains
+
+subroutine one
+  real,allocatable :: A(:),B(:,:)
+  allocate(a(3:0), b(5:1, 2:5))
+
+  if (any (shape(a) /= [0])) stop 1
+  if (any (shape(b) /= [0, 4])) stop 2
+  if (size(a) /= 0) stop 3
+  if (size(b) /= 0) stop 4
+  if (any (lbound(a) /= [1])) stop 5
+  if (any (lbound(b) /= [1, 2])) stop 6
+  if (any (ubound(a) /= [0])) stop 5
+  if (any (ubound(b) /= [0,5])) stop 6
+end
+
+subroutine two
+integer :: x1(10), x2(10,10)
+call f(x1, x2, -3)
+end
+
+subroutine f(y1, y2, n)
+  integer, value :: n
+  integer :: y1(1:n)
+  integer :: y2(1:n,4,2:*)
+  call g(y1, y2)
+end
+
+subroutine g(z1, z2)
+  integer :: z1(..), z2(..)
+
+  if (any (shape(z1) /= [0])) stop 1
+  if (any (shape(z2) /= [0, 4, -1])) stop 2
+  if (size(z1) /= 0) stop 3
+  if (size(z2) /= 0) stop 4
+  if (any (lbound(z1) /= [1])) stop 5
+  if (any (lbound(z2) /= [1, 1, 1])) stop 6
+  if (any (ubound(z1) /= [0])) stop 5
+  if (any (ubound(z2) /= [0, 4, -1])) stop 6
+end
+end


[Patch] Fortran: invoke.texi - link to OpenCoarrays.org + mention libcaf_single

2024-05-19 Thread Tobias Burnus
I noticed that gfortran's coarray support did not link to the 
http://www.opencoarrays.org/


As that library is needed to support parallelization, it makes sense to 
have the link.


Motivated by someone claiming at ISC-HPC that GCC only supports a single 
image.


And also motivated by Damian's presentation, which showed that 
gfortran's coarrays could successfully run the ICAR atmospheric model 
with 25,600 processes (OpenCoarrays with OpenSHMEM backend), which 
definitely is more than one image :-)


I think mentioning the existing libcaf_single is still useful, even 
though it is only of limited use (except that it does ship with GCC and 
permits to do some testings. Especially, it is used by GCC's testsuite).


OK for mainline?

Tobias
Fortran: invoke.texi - link to OpenCoarrays.org + mention libcaf_single

gcc/fortran/ChangeLog:

	* invoke.texi (fcoarray): Link to OpenCoarrays.org;
	mention libcaf_single.

 gcc/fortran/invoke.texi | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 40e8e4a7cdd..78a2910b8d8 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1753,7 +1753,10 @@ Single-image mode, i.e. @code{num_images()} is always one.
 
 @item @samp{lib}
 Library-based coarray parallelization; a suitable GNU Fortran coarray
-library needs to be linked.
+library needs to be linked such as @url{http://opencoarrays.org}.
+Alternatively, GCC's @code{libcaf_single} library can be linked,
+albeit it only supports a single image.
+
 @end table
 
 


[Patch] contrib/gcc-changelog/git_update_version.py: Add ignore commit, improve diagnostic

2024-05-19 Thread Tobias Burnus

I noticed that the last bump happened on Thursday.

* * *

The error is according to
https://gcc.gnu.org/pipermail/gccadmin/2024q2/021298.html

2024-05-19 00:17:28,643:INFO:root:cannot find a ChangeLog location in message

That's the commit
---
Revert "Revert: "Enable prange support.""

This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and enables 
prange
support again.
---

* * * The attached patch adds this commit to the ignore list and helps 
with the diagnosis by showing the failing hash in the error message. OK 
for mainline? Post commit: Can someone install the new version + fix the 
ChangeLog for the ignored commit? * * * What I do not understand: Why does this commit get applied? I do see for both
contrib/gcc-changelog/git_check_commit.py -v -p 
da73261ce7731be7f2b164f1db796878cdc23365 and 
contrib/gcc-changelog/git_email.py 
0001-Revert-Revert-Enable-prange-support.patch the error above. - And I 
do not understand why it made it past the commit check but now fails?

Likewise for8057f9aa1f7e70490064de796d7a8d42d446caf8
Does the commit hook use an older version of the check scripts? Does it 
ignore the errors? Or what goes wrong here? Any idea? Tobias
From f56b1764f2b5c2c83c6852607405e5be0a763a2c Mon Sep 17 00:00:00 2001
From: Tobias Burnus 
Date: Sun, 19 May 2024 08:17:42 +0200
Subject: [PATCH] contrib/gcc-changelog/git_update_version.py: Add ignore
 commit, improve diagnostic

contrib/ChangeLog:

* gcc-changelog/git_update_version.py (IGNORED_COMMITS): Add
	cfceb070e2aea3cef9bd1f50d8d030c51449f45b.
	(prepend_to_changelog_files): Output git hash in case of error.

diff --git a/contrib/gcc-changelog/git_update_version.py b/contrib/gcc-changelog/git_update_version.py
index 24f6c43d0b2..ec0151b83fe 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -41,7 +41,8 @@ IGNORED_COMMITS = (
 '040e5b0edbca861196d9e2ea2af5e805769c8d5d',
 '8057f9aa1f7e70490064de796d7a8d42d446caf8',
 '109f1b28fc94c93096506e3df0c25e331cef19d0',
-'39f81924d88e3cc197fc3df74204c9b5e01e12f7')
+'39f81924d88e3cc197fc3df74204c9b5e01e12f7',
+'da73261ce7731be7f2b164f1db796878cdc23365')
 
 FORMAT = '%(asctime)s:%(levelname)s:%(name)s:%(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT,
@@ -58,6 +59,7 @@ def read_timestamp(path):
 
 def prepend_to_changelog_files(repo, folder, git_commit, add_to_git):
 if not git_commit.success:
+logging.info(f"While processing {git_commit.info.hexsha}:")
 for error in git_commit.errors:
 logging.info(error)
 raise AssertionError()
-- 
2.45.0



[wwwdocs,committed] projects/gomp: Update doc links for GCC 14

2024-05-14 Thread Tobias Burnus

Minor update – to include GCC 14 and update mainline to 15.

I also replaced the doc links to the latest release; shouldn't matter 
for the status but it is nicer nonetheless.


Tobias
commit 6d76756d2070040c35e7991a626805a736edea1d
Author: Tobias Burnus 
Date:   Tue May 14 09:34:47 2024 +0200

projects/gomp: Update doc links for GCC 14

And link to latest GCC 12 + 13 release version

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 05b81f1e..94bda5ff 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -144,10 +144,12 @@ filing a bug report.
 
 Implementation status in libgomp manual:
 https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Implementation-Status.html;
->Mainline (GCC 14),
-https://gcc.gnu.org/onlinedocs/gcc-13.1.0/libgomp/OpenMP-Implementation-Status.html;
+>Mainline (GCC 15),
+https://gcc.gnu.org/onlinedocs/gcc-14.1.0/libgomp/OpenMP-Implementation-Status.html;
+>GCC 14,
+https://gcc.gnu.org/onlinedocs/gcc-13.2.0/libgomp/OpenMP-Implementation-Status.html;
 >GCC 13,
-https://gcc.gnu.org/onlinedocs/gcc-12.1.0/libgomp/OpenMP-Implementation-Status.html;
+https://gcc.gnu.org/onlinedocs/gcc-12.3.0/libgomp/OpenMP-Implementation-Status.html;
 >GCC 12.
 
 Disclaimer: A feature might be only fully supported in a later GCC version


[patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-04-25 Thread Tobias Burnus

Motivated by a surprise of a colleague that with -m32,
no offload dumps were created; that's because mkoffload
does not process host binaries when the are 32bit (i.e. ilp32).

Internally, that done as follows: The host compiler passes to
'mkoffload' the used host ABI, i.e. -foffload-abi=ilp32 or -foffload-abi=lp64

That's done via TARGET_OFFLOAD_OPTIONS, which is supported by aarch64, i386, 
and rs6000.

While it is sensible (albeit not strictly required) that GCC requires that
the host and device side agree and that only 64bit is implemented for the
device side, it can be confusing that silently no offloading code is generated.


Hence, I propose to print a warning in that case - as implemented in the 
attached patch:

$ gcc -fopenmp -m32 test.c
nvptx mkoffload: warning: offload code generation skipped: offloading with 
32-bit host code is currently not supported
gcn mkoffload: warning: offload code generation skipped: offloading with 32-bit 
host code is currently not supported

* * *

This shouldn't have any effect on offload builds using -m64
and non-offload builds – while several testcases already have
issues with '-m32' when offloading is enabled or an offloading
device is available.

To make it not worse, this patch adds some pruning and for
a subset of the failing testcases, I added code to avoids FAILS.
There are some more fails, but those aren't new.

Comments, remarks, suggestions?
Is the mkoffload.cc part is okay?

Tobias
[gcn][nvptx] Add warning to mkoffload for 32bit host code

mkoffload in principle handles 32bit and 64bit offload targets,
but 32bit support has no been implemented.  Before this patch,
offloading is then silently disabled for the respective target.

With the patch, the user gets a warning by mkoffload (and the
programm continues to be build with out offloading code).

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (main): Warn for -foffload-abi=ilp32
	that no offload code will be generated.
	* config/nvptx/mkoffload.cc (main): Likewise.

libgomp/ChangeLog:

	* testsuite/lib/libgomp-dg.exp (libgomp-dg-prune): Prune warning
	by mkoffload that 32-bit offloading is not supported.
	* testsuite/libgomp.c-c++-common/requires-1.c: Silence a FAIL for
	'ia32' targets as for them no offload code is generated.
	* testsuite/libgomp.c-c++-common/requires-3.c: Likewise.
	* testsuite/libgomp.c-c++-common/requires-7.c: Likewise.
	* testsuite/libgomp.c-c++-common/variable-not-offloaded.c: Likewise.
	* testsuite/libgomp.fortran/requires-1.f90: Likewise.

 gcc/config/gcn/mkoffload.cc|  5 -
 gcc/config/nvptx/mkoffload.cc  |  5 -
 libgomp/testsuite/lib/libgomp-dg.exp   |  3 +++
 libgomp/testsuite/libgomp.c-c++-common/requires-1.c|  8 +---
 libgomp/testsuite/libgomp.c-c++-common/requires-3.c|  8 +---
 libgomp/testsuite/libgomp.c-c++-common/requires-7.c| 10 ++
 .../testsuite/libgomp.c-c++-common/variable-not-offloaded.c|  4 ++--
 libgomp/testsuite/libgomp.fortran/requires-1.f90   |  8 +---
 8 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 9a438de331a..c37c269d4d2 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -1143,7 +1143,10 @@ main (int argc, char **argv)
 fatal_error (input_location, "cannot open %qs", gcn_cfile_name);
 
   /* Currently, we only support offloading in 64-bit configurations.  */
-  if (offload_abi == OFFLOAD_ABI_LP64)
+  if (offload_abi == OFFLOAD_ABI_ILP32)
+warning (0, "offload code generation skipped: offloading with 32-bit host "
+		"code is currently not supported");
+  else if (offload_abi == OFFLOAD_ABI_LP64)
 {
   const char *mko_dumpbase = concat (dumppfx, ".mkoffload", NULL);
   const char *hsaco_dumpbase = concat (dumppfx, ".mkoffload.hsaco", NULL);
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 503b1abcefd..a7ff32cf8bd 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -798,7 +798,10 @@ main (int argc, char **argv)
 
   /* PR libgomp/65099: Currently, we only support offloading in 64-bit
  configurations.  */
-  if (offload_abi == OFFLOAD_ABI_LP64)
+  if (offload_abi == OFFLOAD_ABI_ILP32)
+warning (0, "offload code generation skipped: offloading with 32-bit host "
+		"code is currently not supported");
+  else if (offload_abi == OFFLOAD_ABI_LP64)
 {
   char *mko_dumpbase = concat (dumppfx, ".mkoffload", NULL);
   if (save_temps)
diff --git a/libgomp/testsuite/lib/libgomp-dg.exp b/libgomp/testsuite/lib/libgomp-dg.exp
index ebf78e17e6d..9c9a5f2ed4b 100644
--- a/libgomp/testsuite/lib/libgomp-dg.exp
+++ b/libgomp/testsuite/lib/libgomp-dg.exp
@@ -3,5 +3,8 @@ proc libgomp-dg-test { prog do_what extra_tool_flags } {
 }
 
 proc libgomp-dg-prune { system text } {
+global additional_prunes
+

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Tobias Burnus

Richard Biener wrote:

I do wonder whether hot-patching the ELF header from the libgomp plugin
with the actual micro-subarch would be possible to make the driver happy.


For completeness, there is also the possibility to play with an 
environment variable as in HSA_OVERRIDE_GFX_VERSION=9.0.0 or 
HSA_OVERRIDE_GFX_VERSION=11.0.0


Tobias


[wwwdocs] gcc-14/changes.html + projects/gomp/: Fix OpenMP/OpenACC changes section/anchor

2024-04-15 Thread Tobias Burnus
When clicking on the GCC..1x links at 
https://gcc.gnu.org/projects/gomp/#omp5.0 , I noticed that the GCC 13 
and 14 links did not link to the OpenMP changes.


It turned out that in GCC 12 and before (see commit message for 
details), the OpenMP and OpenACC changes are under "New Languages and 
Language-Specific Improvements" – while for GCC 13 and 14 they are under 
"General Improvements"


Example: GCC 12 – https://gcc.gnu.org/gcc-12/changes.html#languages 
(directly under  and before the first  entry ["Ada"]).


GCC 13: https://gcc.gnu.org/gcc-13/changes.html#general

The attached patch keeps GCC 13 for backward compatibility but moves 
them for GCC 14 "back" to languages.


To fix the links at projects/gomp/, it therefore it updates the page 
anchors to 'general'.


* * *

Comments or remarks?

Tobias
gcc-14/changes.html + projects/gomp/: Fix OpenMP/OpenACC changes section/anchor

In earlier release notes, OpenMP and OpenACC changes were under "New
Languages and Language specific improvements", either directly under that
section as in 4.2, 4.4, 4.7, 4.9, 5, 6 (+ c-family + Fortran), 10, 11, and 12
or under a subsection in 4.5 (Fortran), 4.8 (C++), 7 (Fortran), 9 (c-family).

In gcc-13, the OpenMP and OpenACC ended up by chance under "General
Improvements", which gcc-14 replicated.

This commit does not touch gcc-13 to avoid breaking links, but it corrects the
anchor used in the links to GCC 13 in projects/gomp/.

However, for GCC 14, it moves the OpenMP/OpenACC changes to the language
section.

 htdocs/gcc-14/changes.html  | 135 
 htdocs/projects/gomp/index.html |  44 ++---
 2 files changed, 91 insertions(+), 88 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index b4c602a5..6035ae37 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -59,6 +59,75 @@ a work-in-progress.
 
 General Improvements
 
+
+  For offload-device code generated via OpenMP and OpenACC, the math
+  and the Fortran runtime libraries will now automatically be linked,
+  when the user or compiler links them on the host side. Thus, it is no
+  longer required to explicitly pass -lm and/or
+  -lgfortran to the offload-device linker using the https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload-options;
+  >-foffload-options= flag.
+  
+  
+New configure options: --enable-host-pie, to build the
+compiler executables as PIE; and --enable-host-bind-now,
+to link the compiler executables with -Wl,-z,now in order
+to enable additional hardening.
+  
+  
+New option
+https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fhardened;>-fhardened,
+an umbrella option that enables a set of hardening flags.
+The options it enables can be displayed using the
+--help=hardened option.
+  
+  
+New option
+https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fharden-control-flow-redundancy;>-fharden-control-flow-redundancy,
+to verify, at the end of functions, that the visited basic blocks
+correspond to a legitimate execution path, so as to detect and
+prevent attacks that transfer control into the middle of
+functions.
+  
+  
+New type attribute
+https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-hardbool-type-attribute;>hardbool,
+for C and Ada.  Hardened
+booleans take user-specified representations for true
+and false, presumably with higher hamming distance
+than standard booleans, and get verified at every use, detecting
+memory corruption and some malicious attacks.
+  
+  
+New type attribute
+https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-strub-type-attribute;>strub
+to control stack scrubbing
+properties of functions and variables.  The stack frame used by
+functions marked with the attribute gets zeroed-out upon returning
+or exception escaping.  Scalar variables marked with the attribute
+cause functions contaning or accessing them to get stack scrubbing
+enabled implicitly.
+  
+  
+New option
+https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-finline-stringops;>-finline-stringops,
+to force inline
+expansion of memcmp, memcpy,
+memmove and memset, even when that is
+not an optimization, to avoid relying on library
+implementations.
+  
+  
+
+New function attribute
+https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-null_005fterminated_005fstring_005farg-function-attribute;> null_terminated_string_arg(PARAM_IDX)
+for indicating parameters that are expected to be null-terminated
+strings.
+  
+
+
+New Languages and Language specific improvements
+
 
   https://gcc.gnu.org/projects/gomp/;>OpenMP
   
@@ -136,73 +205,7 @@ a work-in-progress.
   acc_memcpy_from_device_async.
   
   
-  For offload-device code 

[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Tobias Burnus
I experimented with some variants to make clearer that each of RDNA2 and 
RNDA3 applies to two card types, but at the end I settled on the 
fewest-word version.


Comments, remarks, suggestions? (To this change or in general?)

Current version: https://gcc.gnu.org/gcc-14/changes.html#amdgcn

Compiler flags, listing the the gfx* cards: 
https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html


Tobias

PS: On the compiler side, I am looking forward to a .def file which 
reduces the number of files to change when adding a new gfx* card, given 
that we have doubled the number of entries. [Well, 1 missing but I know 
of one WIP addition.]
gcc-14/changes.html (AMD GCN): Mention gfx1036 support

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 8ac08e9a..b4c602a5 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -623,8 +623,9 @@ a work-in-progress.
 AMD Radeon (GCN)
 
 
-  Initial support for the AMD Radeon gfx1030 (RDNA2),
-gfx1100 and gfx1103 (RDNA3) devices has been
+  Initial support for the AMD Radeon gfx1030,
+gfx1036 (RDNA2), gfx1100 and
+gfx1103 (RDNA3) devices has been
 added. LLVM 15+ (assembler and linker) is https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa;>required
 to support GFX11.


[Patch] Fortran: List-directed read - accept again tab as alternative to space as separator [PR114304] (was: [patch, libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not a

2024-04-08 Thread Tobias Burnus

Jerry D wrote:

See attached updated patch.


It turned rather quickly out that this patch – committed as 
r14-9822-g93adf88cc6744a – caused regressions.


Namely, real-world code use tab(s) as separator instead of spaces.

[For instance, PR114304 which contains a named-list input file from SPEC 
CPU 2017; that example uses tabs before the '=' sign, but the issue is 
more generic.]


I think the ISO Fortran standard only permits spaces, but as it feels 
natural and is widely supported, tabs are used and should remain supported.


It is not quite clear how '\r' are or should be handled, but as 
eat_spaces did use it, I thought I would add one testcase using them as 
well.


That test is not affected by my change; it did work before with GCC and 
still does – but it does fail with ifort/ifx/flang. I have not thought 
deeply whether it should be supported or not – and looking at the 
libgfortran source file, it often but (→ testcase) not consistently 
requires that an \n follows the \r.


OK for mainline? [And: When the previous patch gets backported, this 
surely needs to be included as well.]


Tobias
Fortran: Accept again tab as alternative to space as separator [PR114304]

This fixes a side-effect of/regression caused by r14-9822-g93adf88cc6744a,
which was for the same PR.

	PR libfortran/114304

libgfortran/ChangeLog:

	* io/list_read.c (eat_separator): Accept tab as alternative to space.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pr114304-2.f90: New test.

 gcc/testsuite/gfortran.dg/pr114304-2.f90 | 82 
 libgfortran/io/list_read.c   |  2 +-
 2 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/pr114304-2.f90 b/gcc/testsuite/gfortran.dg/pr114304-2.f90
new file mode 100644
index 000..5ef5874f528
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr114304-2.f90
@@ -0,0 +1,82 @@
+! { dg-do run }
+!
+! PR fortran/114304
+!
+! Ensure that '\t' (tab) is supported as separator in list-directed input
+! While not really standard conform, this is widely used in user input and
+! widely supported.
+!
+
+use iso_c_binding
+implicit none
+character(len=*,kind=c_char), parameter :: tab = C_HORIZONTAL_TAB
+
+! Accept '' as variant to ' ' as separator
+! Check that  and  are handled
+
+character(len=*,kind=c_char), parameter :: nml_str &
+   = ''//C_CARRIAGE_RETURN // C_NEW_LINE // &
+ 'first'//tab//'='//tab//' .true.'// C_NEW_LINE // &
+ ' , other'//tab//' ='//tab//'3'//tab//', 2'//tab//'/'
+
+! Check that  is handled,
+
+! Note: For new line, Unix uses \n, Windows \r\n but old Apple systems used '\r'
+!
+! Gfortran does not seem to support all \r, but the following is supported
+! since ages, ! which seems to be a gfortran extension as ifort and flang don't like it.
+
+character(len=*,kind=c_char), parameter :: nml_str2 &
+   = ''//C_CARRIAGE_RETURN // C_NEW_LINE // &
+ 'first'//C_NEW_LINE//'='//tab//' .true.'// C_CARRIAGE_RETURN // &
+ ' , other'//tab//' ='//tab//'3'//tab//', 2'//tab//'/'
+
+character(len=*,kind=c_char), parameter :: str &
+   = tab//'1'//tab//'2,'//tab//'3'//tab//',4'//tab//','//tab//'5'//tab//'/'
+character(len=*,kind=c_char), parameter :: str2 &
+   = tab//'1'//tab//'2;'//tab//'3'//tab//';4'//tab//';'//tab//'5'//tab//'/'
+logical :: first
+integer :: other(4)
+integer :: ints(6)
+namelist /inparm/ first , other
+
+other = 1
+
+open(99, file="test.inp")
+write(99, '(a)') nml_str
+rewind(99)
+read(99,nml=inparm)
+close(99, status="delete")
+
+if (.not.first .or. any (other /= [3,2,1,1])) stop 1
+
+other = 9
+
+open(99, file="test.inp")
+write(99, '(a)') nml_str2
+rewind(99)
+read(99,nml=inparm)
+close(99, status="delete")
+
+if (.not.first .or. any (other /= [3,2,9,9])) stop 2
+
+ints = 66
+
+open(99, file="test.inp", decimal='point')
+write(99, '(a)') str
+rewind(99)
+read(99,*) ints
+close(99, status="delete")
+
+if (any (ints /= [1,2,3,4,5,66])) stop 3
+
+ints = 77 
+
+open(99, file="test.inp", decimal='comma')
+write(99, '(a)') str2
+rewind(99)
+read(99,*) ints
+close(99, status="delete")
+
+if (any (ints /= [1,2,3,4,5,77])) stop 4
+end
diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index b56f2a4e6d6..5bbbef26c26 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -463,7 +463,7 @@ eat_separator (st_parameter_dt *dtp)
 
   dtp->u.p.comma_flag = 0;
   c = next_char (dtp);
-  if (c == ' ')
+  if (c == ' ' || c == '\t')
 {
   eat_spaces (dtp);
   c = next_char (dtp);


Re: [patch, libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"

2024-04-05 Thread Tobias Burnus

Hi Jerry, hello world,

Jerry D wrote:

On 4/5/24 10:47 AM, Jerry D wrote:

On 4/4/24 2:41 PM, Tobias Burnus wrote:
I think for the current testcases, I like the patch – the question 
is only what's about:

   ',3' as input for 'comma'   (or '.3' as input for 'point')
[...]
But for 'comma': [...]
* GCC with your patch: Same result: ios != 0 and nothing read.

Expected: [...] read-in value is 0.3. [...]



See attached updated patch.
Regressions tested on x86-64. OK for trunk and 13 after a bit.


OK. Thanks for the patch!

Tobias



Re: [patch, libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"

2024-04-04 Thread Tobias Burnus

Hi Jerry,

I think for the current testcases, I like the patch – the question is 
only what's about:


  ',3' as input for 'comma'   (or '.3' as input for 'point')

For 'point' – 0.3 is read and ios = 0 (as expected)
But for 'comma':
* GCC 12 reads nothing and has ios = 0.
* GCC 13/mainline has an error (ios != 0 – and reads nothing)
* GCC with your patch: Same result: ios != 0 and nothing read.

Expected: Same as with ','/'comma' – namely: read-in value is 0.3.
→ https://godbolt.org/z/4rc8fz4sT for the full example, which works with 
ifort, ifx and flang


* * *

Can you check and fix this? It looks perfectly valid to me to have 
remove the '0' in the floating point numbers '0.3' or '0,3' seems to be 
permitted – and it works for '.' (with 'point') but not for ',' (with 
'comma').


F2023's "13.10.3.1 List-directed input forms" refers to "13.7.2.3.2 F 
editing", which states:


"The standard form of the input field [...] The form of the mantissa is 
an optional sign, followed by a string of one or more digits optionally 
containing a decimal symbol."


The latter does not require that the digit has to be before the decimal 
sign and as for output, it is optional, it is surely intended that ",3" 
is a valid floating-point number for decimal='comma'.


* * *

I extended the testcase to check for this – see attached diff. All 
'point' work, all 'comma' fail.


Thanks for working on this!

Tobiasdiff --git a/gcc/testsuite/gfortran.dg/pr114304.f90 b/gcc/testsuite/gfortran.dg/pr114304.f90
index 8344a9ea857..2bcf9bc7f57 100644
--- a/gcc/testsuite/gfortran.dg/pr114304.f90
+++ b/gcc/testsuite/gfortran.dg/pr114304.f90
@@ -70,7 +70,25 @@
   call t(.true.,  'point', '4,4 ,', .true.)
   call t(.true.,  'comma', '4;4 ;', .true.)
   call t(.true.,  'point', '4,4 ;', .true.)
+
+  call t2('comma', ',2')
+  call t2('point', '.2')
+  call t2('comma', ',2;')
+  call t2('point', '.2,')
+  call t2('comma', ',2 ,')
+  call t2('point', '.2 .')
 contains
+subroutine t2(dec, testinput)
+  character(*) :: dec, testinput
+  integer ios
+  real :: r
+  r = 42
+  read(testinput,*,decimal=dec,iostat=ios) r
+  if (ios /= 0 .or.  abs(r - 0.2) > epsilon(r)) then
+print '(*(g0))', dec, ', testinput = "',testinput,'"',', r=',r,' ios=',ios
+stop 3 
+  end if
+end
 subroutine t(valid, dec, testinput, isreal)
   logical, value :: valid
   character(len=*) :: dec, testinput


[wwwdocs] gcc-14/changes.html: Comment out of empty sections

2024-04-04 Thread Tobias Burnus
I find it confusing to see multiple  in a row without content. 
Actually, both have  as content, but those are commented out as 
actual news is missing ...


See https://gcc.gnu.org/gcc-14/changes.html and see the last entry at 
the bottom of the page and "Operating Systems" somewhere in between.


And comment, remark or suggestion before I commit this?

Tobias
gcc-14/changes.html: Comment out  of empty sections

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 1cc68430..6ddd2788 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -748,7 +748,7 @@ __asm (".global __flmap_lock"  "\n\t"
 
 
 
-Operating Systems
+
 
 
 
@@ -994,7 +994,7 @@ it emits:
 
 
 
-Other significant improvements
+
 
 
 


[wwwdocs] gcc-14/changes.html: Mention OpenACC 2.7's 'readonly' modifier

2024-04-04 Thread Tobias Burnus

Minor OpenACC 2.7 update to https://gcc.gnu.org/gcc-14/changes.html#openacc

The 'readonly' modifier is now in (well, since March), albeit more 2.7 
features are in the pipeline...


Comments, remarks, suggestions before I commit it?

Tobias
gcc-14/changes.html: Mention OpenACC 2.7's 'readonly' modifier

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 045893cf..58f153ec 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -121,7 +121,9 @@ a work-in-progress.
   
 OpenACC 2.7: The self clause was added to be used on
   compute constructs and the default clause for data
-  constructs.
+  constructs. Additionally, the readonly modifier is now
+  handled in the copyin clause and cache
+  directive.
 OpenACC 3.2: The following API routines are now available in
   Fortran using the openacc module or the
   openacc_lib.h header file:


[wwwdocs,committed] gcc-14/changes.html: Fix HTML syntax

2024-04-04 Thread Tobias Burnus

Found when testing my own change via https://validator.w3.org/nu/#file

Committed as obvious.

Tobias
commit c9e275660a19c804dd8c591c73cb9b169a9d7573
Author: Tobias Burnus 
Date:   Thu Apr 4 22:07:28 2024 +0200

gcc-14/changes.html: Fix HTML syntax

W3.org's HTML checker complained about missing  and
about ... within a ... (or rather: it complained about
the unexpected '').
---
 htdocs/gcc-14/changes.html | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 045893cf..1cc68430 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -861,7 +861,7 @@ __asm (".global __flmap_lock"  "\n\t"
   
 
 The analyzer now makes use of the function attribute
-https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute;>alloc_size
+https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute;>alloc_size
 allowing
 https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-fanalyzer;>-fanalyzer
 to emit
@@ -887,7 +887,7 @@ __asm (".global __flmap_lock"  "\n\t"
   
   
 
-The warning
+  The warning
   https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-Wanalyzer-out-of-bounds;>-Wanalyzer-out-of-bounds
   has been extended so that, where possible, it will emit a text-based
   diagram visualizing the spatial relationship between
@@ -899,9 +899,9 @@ __asm (".global __flmap_lock"  "\n\t"
   whether they overlap, are touching, are close or far apart;
   which one is before or after in memory, the relative sizes involved,
   the direction of the access (read vs write), and, in some cases,
-  the values of data involved.
+  the values of data involved.
 Such "text art" diagrams can be controlled (or suppressed) via a new
-  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-text-art-charset;>-fdiagnostics-text-art-charset= option.
+  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-text-art-charset;>-fdiagnostics-text-art-charset= option.
 For example, given the out-of-bounds write in strcat in:
   
 
@@ -953,17 +953,17 @@ it emits:
   
 
 The SARIF output from
-https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=
 now adds indentation and newlines to reflect the logical JSON structure of the data.  The previous compact behavior can be restored via the new option
-https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fno-diagnostics-json-formatting;>-fno-diagnostics-json-formatting.
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fno-diagnostics-json-formatting;>-fno-diagnostics-json-formatting.
 This also applies to the older output format named "json".
   
   
 
 If profiling information about the compiler itself is requested via
-https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-ftime-report;>-ftime-report,
+https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-ftime-report;>-ftime-report,
 and a SARIF output format is requested via
-https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=,
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=,
 then the timing and memory usage data is now written in JSON form into
 the SARIF output, rather than as plain text to stderr.
   


Re: [patch, libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"

2024-04-04 Thread Tobias Burnus

Hi Jerry,

Jerry D wrote:
The attached log entry and patch (git show) fixes this issue by adding 
logic to handle spaces in eat_separators. One or more spaces by 
themselves are a valid separator. So in this case we look at the 
character following the spaces to see if it is a comma or semicolon.


If so, I change it to the valid separator for the given decimal mode, 
point or comma. This allows the comma or semicolon to be interpreted as 
a null read on the next effective item in the formatted read.


I chose a permissive approach here that allows reads to proceed when the
input line is mal-formed with an incorrect separator as long as there is 
at least one space in front of it.


First: Consider also adding 'PR fortran/105473' to the commit log
as the PRs are closely related, albeit this PR is different-

The patch looks mostly like I would expect, except for decimal='point' 
and a ';' which is *not* preceded by a space.


Thanks for working on it.

Regarding the 'except' case:

* * *

If I try your patch with the testcase of at comment 19,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114304#c19
→ https://gcc.gnu.org/bugzilla/attachment.cgi?id=57695 ,

I do note that with 'decimal=point', a tailing semicolon is silently
accepted – even if not proceeded by a space.

I think such code is invalid – and you could consider to reject it.
Otherwise, the handling all seems to be in line with the Fortran spec.

i.e. for the following string, I had *expected an error*:

 point, isreal =  F , testinput = ";"n=  42  ios=   0
 point, isreal =  F , testinput = "5;"n=   5  ios=   0
 point, isreal =  T , testinput = "8;"r=   8.  ios= 0
 point, isreal =  T , testinput = "3.3;"r=   3.2995  ios= 0
 point, isreal =  T , testinput = "3,3;"r=   3.  ios= 0

while I think the following is OK (i.e. no error is what I expect) due 
to the the space before the ';'.


 point, isreal =  F , testinput = "7 ;"n=   7  ios= 0
 point, isreal =  T , testinput = "9 ;"r=   9.  ios= 0
 point, isreal =  T , testinput = "4.4 ;"r=   4.4010  ios=0
 point, isreal =  T , testinput = "9 ;"r=   9.  ios= 0
 point, isreal =  T , testinput = "4,4 ;"r=   4.  ios= 0

* * *

Looking at the other compilers, ifort, ifx and Flang do issue an error 
here. Likewise, g95 seems to yield an error in this case (see below).


I do note that the Lapack testcase that triggered this PR did have such 
a code - but it was then changed because g95 did not like it:


https://github.com/Reference-LAPACK/lapack/commit/64e8a7500d817869e5fcde35afd39af8bc7a8086

In terms of gfortran: until recently did accept it (all versions, 
including 13+14); it then rejected it due to the change in PR105473 (GCC 
14/mainline, backported to 13)– but I now think it rightly did so. With 
the current patch, it is accepted again.


* * *

I have attached the modified testcase linked above; consider adding it 
as well. - Changes to the one of the attachment:

- I added a few additional (albeit boring) tests
- I added an expected output + error diagnostic.

The testcase assumes an error for ';' as separator (with 'point'), 
unless there is a space before it.


[If we want to not diagnose this as vendor extension, we really need to 
add a comment to that testcase besides changing valid = .false. to .true.]


Tobias! { dg-do run }
!
! PR fortran/114304
!
! See also PR fortran/105473
!
! Testing: Does list-directed reading an integer/real allows some non-integer input?
!
! Note: GCC result comments before fix of this PR.

  implicit none
  call t(.true.,  'comma', ';') ! No error shown
  call t(.false., 'point', ';') ! /!\ gfortran: no error, others: error
  call t(.false., 'comma', ',') ! Error shown
  call t(.true.,  'point', ',') ! No error shown
  call t(.false., 'comma', '.') ! Error shown
  call t(.false., 'point', '.') ! Error shown
  call t(.false., 'comma', '5.') ! Error shown
  call t(.false., 'point', '5.') ! gfortran/flang: Error shown, ifort: no error
  call t(.false., 'comma', '5,') ! gfortran: error; others: no error
  call t(.true.,  'point', '5,') ! No error shown
  call t(.true.,  'comma', '5;') ! No error shown
  call t(.false., 'point', '5;') ! /!\ gfortran: no error shown, others: error
  call t(.true.,  'comma', '7 .') ! No error shown
  call t(.true.,  'point', '7 .') ! No error shown
  call t(.true.,  'comma', '7 ,') ! /!\ gfortran: error; others: no error
  call t(.true.,  'point', '7 ,') ! No error shown
  call t(.true.,  'comma', '7 ;') ! No error shown
  call t(.true.,  'point', '7 ;') ! No error shown

!  print *, '---'

  call t(.false., 'comma', '8.', .true.) ! Error shown
  call t(.true.,  'point', '8.', .true.) ! gfortran/flang: Error shown, ifort: no error
  call t(.true.,  'comma', '8,', .true.) ! gfortran: error; others: no error
  call t(.true.,  'point', '8,', .true.) ! No error shown
  call t(.true.,  'comma', '8;', .true.) ! No 

[wwwdocs] projects/gomp/: Update TR12 status - fix misplaced GCC-14; add new items

2024-04-04 Thread Tobias Burnus

TR12 update:
* I misplaced one implemented in GCC 14 in one of the last commits
* Same update as just proposed for libgomp.texi:
  - Renaming of 'coexecute' to 'workdistribute'
(Post TR12 change to avoid confusion with Fortran's co_min,
 co_broadcast, ... intrinsic procedures for coarrays)
  - Add item about { } / BLOCK in canonical loop nests

Comments, suggestions, other remarks before I commit it?

Current 
version:file:///home/tob/repos/gcc-wwwdocs/htdocs/projects/gomp/index.html

Tobias
projects/gomp/: Update TR12 status - fix misplaced GCC-14; add new items

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index b8f11508..798efb21 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -846,7 +846,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 declare mapper with iterator and present modifiers
-GCC14
+No
 
   
   
@@ -871,7 +871,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 New allocators directive for Fortran
-No
+GCC14
 
   
   
@@ -1225,9 +1225,9 @@ error.
 
   
   
-coexecute directive for Fortran
+workdistribute directive for Fortran
 No
-
+Renamed just after TR12; added in TR12 as coexecute
   
   
 Fortran DO CONCURRENT as associated loop in a loop
@@ -1295,6 +1295,11 @@ error.
 No
 
   
+  
+Canonical loop nest enclosed in (multiple) curly braces (C/C++) or BLOCK constructs (Fortran)
+No
+
+  
   
 Relaxed Fortran restrictions to the aligned clause
 No


[Patch] libgomp.texi: Update entries in OpenMP TR12 implementation status

2024-04-04 Thread Tobias Burnus

Hi all,

this patch updates the OpenMP TR12 status (to-do) items:

(a) 'coexecute', added in TR12, was renamed after TR12 to
'workdistribute'. Reason: Feedback that 'co...' reminds
of Fortran coarrays and the its intrinsic procedures:
co_broadcast, co_max, co_min, co_reduce, co_sum and
→ Honor this in the status but mention old name, hopefully,
  reducing some confusion and ensuring that we miss to update
  that entry once OpenMP 6.0 is released next November

(b) Since TR12, canonical loop nest forms can now be enclosed
in { ... } in C/C++ or in BLOCK in Fortran. Add it to ensure
we won't forget implementing this feature.

Comments, remarks, additions before I commit it?

Tobias

PS: There are surely more items I missed when updating the list
for TR12; I will either have a go later in this year or do it
when updating for the final OpenMP 6.0 in/after November this year.
libgomp.texi: Update entries in OpenMP TR12 implementation status

libgomp/ChangeLog:

	* libgomp.texi (TR12): Honor post-TR12 directive name change; add
	item about curly braces/BLOCK permitted in canonical loop nests.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 1ae0f01ccdc..71d62105a20 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -515,7 +515,8 @@ Technical Report (TR) 12 is the second preview for OpenMP 6.0.
 @item @code{strict} modifier keyword to @code{num_threads} @tab N @tab
 @item @code{atomic} permitted in a construct with @code{order(concurrent)}
   @tab N @tab
-@item @code{coexecute} directive for Fortran @tab N @tab
+@item @code{workdistribute} directive for Fortran @tab N
+  @tab Renamed just after TR12; added in TR12 as @code{coexecute}
 @item Fortran DO CONCURRENT as associated loop in a @code{loop} construct
   @tab N @tab
 @item @code{threadset} clause in task-generating constructs @tab N @tab
@@ -539,6 +540,8 @@ Technical Report (TR) 12 is the second preview for OpenMP 6.0.
 
 @unnumberedsubsec Other new TR 12 features
 @multitable @columnfractions .60 .10 .25
+@item Canonical loop nest enclosed in (multiple) curly braces (C/C++) or BLOCK constructs (Fortran)
+  @tab N @tab
 @item Relaxed Fortran restrictions to the @code{aligned} clause @tab N @tab
 @item Mapping lambda captures @tab N @tab
 @item New @code{omp_pause_stop_tool} constant for omp_pause_resource @tab N @tab


[Patch] nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl

2024-04-03 Thread Tobias Burnus

Nvptx's mkoffload.cc contains 14 'fatal_error' calls and one 'warning_at' call,
which stands out more clearly (color, bold) when enabling
  diagnostic_color_init
which this patch does. — Additionally, the call gcc_init_libintl permits that
the already translated error messages also show up as translation.

OK for mainline?

Tobias

PS: Example: 'nvptx mkoffload:' is bold and 'fatal error:' is in red
in English and some language variants.

nvptx mkoffload: fatal error: COLLECT_GCC must be set.
nvptx mkoffload: 致命的エラー: COLLECT_GCC must be set.
nvptx mkoffload: erreur fatale: COLLECT_GCC doit être défini.
nvptx mkoffload: schwerwiegender Fehler: COLLECT_GCC muss gesetzt sein.

(BTW: It looks as if many languages did not translate the error string
itself, e.g. jp or zh or pl or zh_TW/zh_CN or fi or ...)
nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (main): Call
	gcc_init_libintl and diagnostic_color_init.

 gcc/config/nvptx/mkoffload.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index a7fc28cbd3f..503b1abcefd 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -638,7 +638,9 @@ main (int argc, char **argv)
   const char *outname = 0;
 
   progname = tool_name;
+  gcc_init_libintl ();
   diagnostic_initialize (global_dc, 0);
+  diagnostic_color_init (global_dc);
 
   if (atexit (mkoffload_cleanup) != 0)
 fatal_error (input_location, "atexit failed");


[Patch] lto-wrapper.cc: Add offload target name to 'offload_args' suffix

2024-04-03 Thread Tobias Burnus

Found when working with -save-temps and looking at 'mkoffload'
with a GCC configured for both nvptx and gcn offloading.

Before (for 'a.out') for mkoffload:a.offload_args now: a.amdgcn-amdhsa.offload_args 
and a.nvptx-none.offload_args

OK for mainline?

Tobias

PS: The code does not free the 'xmalloc'ed memory, but that's also
the case of all/most 'concat' in this file; the concat could also
be skipped when no save_temps is used, in case this optimization
makes sense.
lto-wrapper.cc: Add offload target name to 'offload_args' suffix

lto-wrapper.cc's compile_offload_image calls mkoffload with
an @./a.offload_args argument ('a.' in case of, e.g., 'a.out'). However,
when generating code for both nvptx and gcn, they use the same name
with -save-temps. Hence, this commit adds a  + '.' before
'offload_args' in line with other offload-target-specific files.

gcc/ChangeLog:

	* lto-wrapper.cc (compile_offload_image): Prefix 'offload_args'
	suffix by the target name.

diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index ca53e4b462e..610594cdc2b 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -993,7 +993,8 @@ compile_offload_image (const char *target, const char *compiler_path,
 
   obstack_ptr_grow (_obstack, NULL);
   argv = XOBFINISH (_obstack, char **);
-  fork_execute (argv[0], argv, true, "offload_args");
+  suffix = concat (target, ".offload_args", NULL);
+  fork_execute (argv[0], argv, true, suffix);
   obstack_free (_obstack, NULL);
 
   free_array_of_ptrs ((void **) paths, n_paths);


Re: [Patch] GCN: install.texi update for Newlib change and LLVM 18 release

2024-04-03 Thread Tobias Burnus

Hi Jakub, hello world

Jakub Jelinek wrote:

On Wed, Apr 03, 2024 at 11:09:19AM +0200, Tobias Burnus wrote:

@@ -3954,8 +3956,8 @@ on the GPU.
  To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured 
with
  @option{--with-arch=@code{fiji}} or
  @option{--with-multilib-list=@code{fiji},...}.  Note that support for Fiji 
[...]
+devices has been removed in ROCm 4.0 and support in LLVM is deprecated and has
+been removed in LLVM 18.

Shouldn't we at configure time then detect the case where fiji can't be
supported and either error if it is included explicitly in multilib list, or
implicitly take it out from that list and arrange error to be emitted when
using -march=fiji/gfx803 ?


I am not sure that it is really needed for the reasons given below.
And while it would help some specific use (having LLVM 17 and wanting to use 
Fiji),
it will also cause some confusion as GCC 14 will magically behave differently
depending how build.

Additionally:

* I bet most use gcc/config.gcc which works in most cases just fine
  (LLVM >= 17; enabling all but Fiji)

* Fiji itself is old – removed from recent ROCm and LLVM >= 18,
  which also implies that it is seen as not seeing a lot of use

While there is no configure-time check, using Fiji with LLVM 18 will
fail with a semi-clear compile-time error when doing the in-tree newlib
build or the libgomp build.
(This shows up by default as issue with LLVM 18 + GCC 12/13;
 see https://gcc.gnu.org/PR114419)

Likewise, it will fail with LLVM < 15 when building gfx1100/gfx1103.

* * *

Note: The compiler itself is perfectly happy to handle fiji and gfx1100 itself,
just the LLVM MC assembler doesn't support one [< 15] or the other [>=LLVM 18].

* * *

For those tracking GCC or caring, the documentation at
  https://gcc.gnu.org/gcc-14/changes.html#amdgcn
and
  https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa
provides some glory details.

And it is also mentioned at https://gcc.gnu.org/wiki/Offloading


Tobias



[Patch] GCN: install.texi update for Newlib change and LLVM 18 release

2024-04-03 Thread Tobias Burnus

Update for the GCN Newlib commit 7dd4eb1db "amdgcn: Implement proper locks",
https://sourceware.org/git/?p=newlib-cygwin.git;a=commit;h=7dd4eb1db

And change future to past tense regarding the LLVM 18 release.

OK for mainline?

Thanks,

Tobias
GCN: install.texi update for Newlib change and LLVM 18 release

gcc/ChangeLog:

	* doc/install.texi (amdgcn-*-amdhsa): Update Newlib recommendation
	and update wording for LLVM 18 release.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 269fe7ec870..022bc32901c 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3944,7 +3944,9 @@ Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy
 by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
-Use Newlib (4.3.0 or newer; 4.4.0 or later is recommended).
+Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
+7dd4eb1db (2025-03-25, post-4.4.0) fixes device console output for GFX10 and
+GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use
@@ -3954,8 +3956,8 @@ on the GPU.
 To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured with
 @option{--with-arch=@code{fiji}} or
 @option{--with-multilib-list=@code{fiji},...}.  Note that support for Fiji
-devices has been removed in ROCm 4.0 and support in LLVM is deprecated and will
-be removed in LLVM 18.
+devices has been removed in ROCm 4.0 and support in LLVM is deprecated and has
+been removed in LLVM 18.
 
 @html
 


[Patch] GCN: Fix --with-arch= handling in mkoffload [PR111966]

2024-04-03 Thread Tobias Burnus

This patch handles --with-arch= in GCN's mkoffload.cc

While mkoffload mostly does not know this and passes it through to the GCN lto1 
compiler,
it writes an .o file with debug information - and here the -march= in the ELF 
flags must
agree with the one in the other files. Hence, it uses now the --with-arch= 
config argument.

Doing so, there is now a diagnostic if the -march= or --with-arch= is unknown. 
While the
latter should be rejected at GCC compile time, the latter was not diagnosed in 
mkoffload
but only later in GCN's compiler.

But as there is now a fatal_error in mkoffload, which comes before the 
GCN-compiler call,
the 'note:' which devices are available were lost. This has been reinstated by 
using
the multilib settings. (That's not identical to the compiler supported flags 
the output
is reasonable, arguable better or worse than lto1.)

Advantage: The output is less cluttered than a later fail.

To make mkoffload errors - and especially this one - more useful, it now also 
initializes
the colorization / bold.

OK for mainline?

* * *

Example error:

gcn mkoffload: error: unrecognized argument in option '-march=gfx'
gcn mkoffload: note: valid arguments to '-march=' are: gfx906, gfx908, gfx90a, 
gfx1030, gfx1036, gfx1100, gfx1103

where on my TERM=xterm-256color,  'gcn mkoffload:' and the quoted texts are in 
bold,
'error:' is red and 'note:' is cyan.

Compared to cc1, the 'note:' lacks 'fiji', the list is separated by ', '
instead of ' ', and cc1 has a "; did you mean 'gfx1100'?".
And the program name is 'gcn mkoffload' instead of 'cc1'.

Tobias

PS: The generated multilib list could be later changed to be based on the 
gcn-.def file;
or we just keep the multiconfig variant of this patch.
GCN: Fix --with-arch= handling in mkoffload [PR111966]

The default -march= setting used in mkoffload did not reflect the modified
default set by GCC's configure-time --with-arch=, causing issues when
generating debug code.

gcc/ChangeLog:

	PR other/111966
	* config/gcn/mkoffload.cc (get_arch): New; moved -march= flag
	handling from ...
	(main): ... here; call it to handle --with-arch config option
	and -march= commandline.

 gcc/config/gcn/mkoffload.cc | 90 -
 1 file changed, 72 insertions(+), 18 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 04356b86195..31266d2099b 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -35,6 +35,8 @@
 #include "gomp-constants.h"
 #include "simple-object.h"
 #include "elf.h"
+#include "configargs.h"  /* For configure_default_options.  */
+#include "multilib.h"  /* For multilib_options.  */
 
 /* These probably won't (all) be in elf.h for a while.  */
 #undef  EM_AMDGPU
@@ -846,6 +848,62 @@ compile_native (const char *infile, const char *outfile, const char *compiler,
   obstack_free (_obstack, NULL);
 }
 
+int
+get_arch (const char *str, const char *with_arch_str)
+{
+  if (strcmp (str, "fiji") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX803;
+  else if (strcmp (str, "gfx900") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX900;
+  else if (strcmp (str, "gfx906") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX906;
+  else if (strcmp (str, "gfx908") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX908;
+  else if (strcmp (str, "gfx90a") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX90a;
+  else if (strcmp (str, "gfx1030") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX1030;
+  else if (strcmp (str, "gfx1036") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX1036;
+  else if (strcmp (str, "gfx1100") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX1100;
+  else if (strcmp (str, "gfx1103") == 0)
+return EF_AMDGPU_MACH_AMDGCN_GFX1103;
+
+  error ("unrecognized argument in option %<-march=%s%>", str);
+
+  /* The suggestions are based on the configured multilib support; the compiler
+ itself might support more.  */
+  if (multilib_options[0] != '\0')
+{
+  /* Example: "march=gfx900/march=gfx906" */
+  char *args = (char *) alloca (strlen (multilib_options));
+  const char *p = multilib_options, *q = NULL;
+  args[0] = '\0';
+  while (true)
+	{
+	  p = strchr (p, '=');
+	  if (!p)
+	break;
+	  if (q)
+	strcat (args, ", ");
+	  ++p;
+	  q = strchr (p, '/');
+	  if (q)
+	strncat (args, p, q-p);
+	  else
+	strcat (args, p);
+	}
+  inform (UNKNOWN_LOCATION, "valid arguments to %<-march=%> are: %s", args);
+}
+  else if (with_arch_str)
+inform (UNKNOWN_LOCATION, "valid argument to %<-march=%> is %qs", with_arch_str);
+
+  exit (FATAL_EXIT_CODE);
+
+  return 0;
+}
+
 int
 main (int argc, char **argv)
 {
@@ -853,9 +911,21 @@ main (int argc, char **argv)
   FILE *out = stdout;
   FILE *cfile = stdout;
   const char *outname = 0;
+  const char *with_arch_str = NULL;
 
   progname = tool_name;
+  gcc_init_libintl ();
   diagnostic_initialize (global_dc, 0);
+  diagnostic_color_init (global_dc);
+
+  for (size_t i = 0; i < ARRAY_SIZE 

Re: [PATCH] amdgcn: Add gfx1036 target

2024-03-25 Thread Tobias Burnus

Richard Biener wrote:

I'll follow up with the libgomp testing test summary for archival
purposes.  I still see linker errors for testcases using -g
(the ld: ^[[0;31merror: ^[[0mincompatible mach:
/tmp/ccr0oDpD.mkoffload.dbg.o^M kind)


Hmm, odd – can you try compile with -save-temp and look at the relevant 
files with, e.g., readelf -h on the GCN files (e.g. 'readelf -h 
*.xamdgcn-amdhsa.mkoffload.*o') – that should show under "Flags" what 
the program was compiled for.


We did encounter this issue with LLVM 18 and the solution was explicitly 
set the version both in the compiler via gcc/config/gcn/gcn-hsa.h's


#define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
 "!march=*|march=*:--amdhsa-code-object-version=4"

and for the debugging data in mkoffload.cc's

  ehdr.e_ident[8] = (elf_arch == EF_AMDGPU_MACH_AMDGCN_GFX803
 ? ELFABIVERSION_AMDGPU_HSA_V3
 : ELFABIVERSION_AMDGPU_HSA_V4);

But I fail to see why this doesn't work for you - you should get V4 for 
your gfx1036 target.


Here, ELFABIVERSION_AMDGPU_HSA_V4 2 (V1 did not have a number and V2 
started with 0, hence V3 = 1 etc.)


What LLVM version did you use for the assembler (llvm-mc)?

Tobias


Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Tobias Burnus

Hi Andrew,

Andrew Stubbs wrote:
This is more-or-less what I was planning to do myself, but as I want 
to include all the other features that get parametrized in gcn.cc, 
gcn.h, gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet. 
Unfortunately, I think the gcn.opt and config.gcc will always need 
manually updating, but if that's all it'll be an improvement.


Well, for .opt see how nvptx does it – it actually generates an .opt file.


I don't like the idea of including AMDGPU_ISA_UNSUPPORTED;


I concur – I was initially thinking of reporting the device name 
("Unsupported %s") but I then realized that the agent returns a string 
while only for GCC generated files (→ eflag) the hexcode is used. Thus, 
I ended up not using it.


Ultimately, I want to replace many of the conditionals like 
"TARGET_CDNA2_PLUS" from the code and replace them with feature flags 
derived from a def file, or at least a header file. We've acquired too 
many places where there are unsearchable conditionals that need 
finding and fixing every time a new device comes along.
I was thinking of having more flags, but those where the only ones 
required for the two files.
I had imagined that this .def file would exist in gcc/config/gcn, but 
you've placed it in libgomp maybe it makes sense to have multiple 
such files if they contain very different data, but I had imagined one 
file and I'm not sure that the compiler definitions live in libgomp.


There is already:

gcc/config/darwin-c.cc:#include "../../libcpp/internal.h"

gcc/config/gcn/gcn-run.cc:#include 
"../../../libgomp/config/gcn/libgomp-gcn.h"


gcc/fortran/cpp.cc:#include "../../libcpp/internal.h"

gcc/fortran/trigd_fe.inc:#include "../../libgfortran/intrinsics/trigd.inc"

But there is also the reverse:

libcpp/lex.cc:#include "../gcc/config/i386/cpuid.h"

libgfortran/libgfortran.h:#include "../gcc/fortran/libgfortran.h"

lto-plugin/lto-plugin.c:#include "../gcc/lto/common.h"

If you add more items, it is probably better to have it under 
gcc/config/gcn/ - and I really prefer a single file for all.


* * *

Talking about feature sets: This would be a bit like LLVM (see below) 
but I think they have a bit too much indirections. But I do concur that 
we need to consolidate the current support – and hopefully make it 
easier to keep adding more GPU support; we seem to have already covered 
a larger chunk :-)


I also did wonder whether we should support, e.g., running a gfx1100 
code (or a gfx11-generic one) on, e.g., a gfx1103 device. Alternatively, 
we could keep the current check which requires an exact match.


BTW: I do note that looking at the feature sets in LLVM that all GFX110x 
GPUs seem to have common silicon bugs: FeatureMSAALoadDstSelBug and 
FeatureMADIntraFwdBug, while 1100 and 1102 additionally have the 
FeatureUserSGPRInit16Bug but 1101 and 1103 don't. — For some reasons, 
FeatureISAVersion11_Generic only consists of two of those bugs (it 
doesn't have FeatureMADIntraFwdBug), which doesn't seem to be that 
consistent. Maybe the workaround has issues elsewhere? If so, a generic 
-march=gfx11 might be not as useful as one might hope for.


* * *

If I look at LLVM's 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPU.td 
,


they first define several features – like 'FeatureUnalignedScratchAccess'.

Then they combine them like in:

def FeatureISAVersion11_Common ... [FeatureGFX11, ... 
FeatureAtomicFaddRtnInsts ...


And then they use those to map them to feature sets like:

def FeatureISAVersion11_0_Common ... 
listconcat(FeatureISAVersion11_Common.Features,

    [FeatureMSAALoadDstSelBug ...

And for gfx1103:

def FeatureISAVersion11_0_3 : FeatureSet<
  !listconcat(FeatureISAVersion11_0_Common.Features,
    [])>;

The mapping to gfx... names then happens in 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/GCNProcessors.td 
such as:


def : ProcessorModel<"gfx1103", GFX11SpeedModel,
  FeatureISAVersion11_0_3.Features
>;

Or for the generic one, i.e.:

// [gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151]
def : ProcessorModel<"gfx11-generic", GFX11SpeedModel,
  FeatureISAVersion11_Generic.Features

LLVM also has some generic flags like the following in 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/TargetParser/TargetParser.cpp


    {{"gfx1013"},   {"gfx1013"}, GK_GFX1013, 
FEATURE_FAST_FMA_F32|FEATURE_FAST_DENORMAL_F32|FEATURE_WAVE32|FEATURE_XNACK|FEATURE_WGP},


I hope that this will give some inspiration – but I assume that at least 
the initial implementation will be much shorter.


Tobias



[Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Tobias Burnus
Given the large number of AMD GPU ISAs and the number of files which 
have to be adapted, I wonder whether it makes sense to consolidate this 
a bit, especially in the light that we may want to support more in the 
future.


Besides using some macros, I also improved the diagnostic if the object 
code couldn't be recognized (shouldn't happen) or if the GPU is 
unsupported (likely; it now prints the GPU string). I was initially 
thinking of resolving the arch encoded in the eflag to a string, but as 
this is about GCC-generated code, it seemed to be unlikely of much use. 
[It should that rare that we might also go back to the static string 
instead of outputting the hex value of the eflag.]


Note: I only modified mkoffload.cc and plugin-gcn.c, but with some 
tweaks it could also be used for other files in gcc/config/gcn/.


If you add a new ISA, you still need to update plugin-gcn.c's 
max_isa_vgprs and the xnack/sram-ecc handling in mkoffload.c's main, but 
that should be all for those two files.


Thoughts?

Tobias

PS: I think the patch is fine and builds, but I have not tested it on an 
AMD GPU machine, yet.


PPS: For using for other files, see also in config/nvptx which uses 
nvptx-sm.def to generate several files.
GCN: Define ISA archs in gcn-devices.def and use it

Adding new a GCN ISAs requires to update many files, making it more
likely to miss a file; by adding the gcn-devices.def file and using
it in config/gcn/mkoffload.cc and libgomp/plugin/plugin-gcn.c, it
reduces the duplications.

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_...): Replace
	explicit #define by an enum created from gcn-devices.def.
	(main): Use gcn-devices.def definitions for -march=gfx.* string
	parsing.

libgomp/ChangeLog:

	* plugin/gcn-devices.def: New file.
	* plugin/plugin-gcn.c (gcn_..._s): Remove.
	(enum EF_AMDGPU_MACH): Generate EF_AMDGPU_MACH_AMDGCN_...
	using gcn-devices.def.
	(isa_hsa_name, isa_gcc_name, isa_code): Use gcn-devices.def
	to handle the ISAs.
	(max_isa_vgprs): Update used enum name (GFX90a -> GFX90A).
	(isa_matches_agent, GOMP_OFFLOAD_init_device): Be more verbose
	in case of an unsupported ISA.

 gcc/config/gcn/mkoffload.cc|  42 ++-
 libgomp/plugin/gcn-devices.def |  62 ++
 libgomp/plugin/plugin-gcn.c| 118 +++--
 3 files changed, 119 insertions(+), 103 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index fe443abba21..081110d7030 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -47,20 +47,14 @@
 #undef  ELFABIVERSION_AMDGPU_HSA_V4
 #define ELFABIVERSION_AMDGPU_HSA_V4 2
 
-#undef  EF_AMDGPU_MACH_AMDGCN_GFX803
-#define EF_AMDGPU_MACH_AMDGCN_GFX803 0x2a
-#undef  EF_AMDGPU_MACH_AMDGCN_GFX900
-#define EF_AMDGPU_MACH_AMDGCN_GFX900 0x2c
-#undef  EF_AMDGPU_MACH_AMDGCN_GFX906
-#define EF_AMDGPU_MACH_AMDGCN_GFX906 0x2f
-#undef  EF_AMDGPU_MACH_AMDGCN_GFX908
-#define EF_AMDGPU_MACH_AMDGCN_GFX908 0x30
-#undef  EF_AMDGPU_MACH_AMDGCN_GFX90a
-#define EF_AMDGPU_MACH_AMDGCN_GFX90a 0x3f
-#undef  EF_AMDGPU_MACH_AMDGCN_GFX1030
-#define EF_AMDGPU_MACH_AMDGCN_GFX1030 0x36
-#undef  EF_AMDGPU_MACH_AMDGCN_GFX1100
-#define EF_AMDGPU_MACH_AMDGCN_GFX1100 0x41
+/* Use an enum as macros cannot define macros and
+   assume that EF_AMDGPU_MACH_AMDGCN_... is not #defined.  */
+enum {
+#define AMDGPU_ISA(suffix, str, val) \
+ EF_AMDGPU_MACH_AMDGCN_ ## suffix = val,
+#include "../libgomp/plugin/gcn-devices.def"
+#undef AMDGPU_ISA
+};
 
 #define EF_AMDGPU_FEATURE_XNACK_V4	0x300  /* Mask.  */
 #define EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4	0x000
@@ -959,18 +953,12 @@ main (int argc, char **argv)
 	dumppfx = argv[++i];
   else if (strcmp (argv[i], "-march=fiji") == 0)
 	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX803;
-  else if (strcmp (argv[i], "-march=gfx900") == 0)
-	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;
-  else if (strcmp (argv[i], "-march=gfx906") == 0)
-	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX906;
-  else if (strcmp (argv[i], "-march=gfx908") == 0)
-	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX908;
-  else if (strcmp (argv[i], "-march=gfx90a") == 0)
-	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX90a;
-  else if (strcmp (argv[i], "-march=gfx1030") == 0)
-	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX1030;
-  else if (strcmp (argv[i], "-march=gfx1100") == 0)
-	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX1100;
+#define AMDGPU_ISA(suffix, str, val) \
+  else if (strcmp (argv[i], "-march=" str) == 0) \
+	elf_arch = EF_AMDGPU_MACH_AMDGCN_ ## suffix;
+#include "../libgomp/plugin/gcn-devices.def"
+#undef AMDGPU_ISA
+
 #define STR "-mstack-size="
   else if (startswith (argv[i], STR))
 	gcn_stack_size = atoi (argv[i] + strlen (STR));
@@ -1029,7 +1017,7 @@ main (int argc, char **argv)
   if (TEST_SRAM_ECC_UNSET (elf_flags))
 	SET_SRAM_ECC_ANY (elf_flags);
   break;
-case EF_AMDGPU_MACH_AMDGCN_GFX90a:
+case EF_AMDGPU_MACH_AMDGCN_GFX90A:
   if (TEST_XNACK_UNSET 

Re: OpenACC 2.7: front-end support for readonly modifier: Add basic OpenACC 'declare' testing

2024-03-14 Thread Tobias Burnus

Hi all, hi Thomas & Chung-Lin,

Thomas Schwinge wrote:

But I realized another thing: don't we have to handle the 'readonly'
modifier also in Fortran module files, that is, next to the OpenACC
'declare' 'copyin' handling in 'gcc/fortran/module.cc':
'AB_OACC_DECLARE_COPYIN' etc.?


I bet so; it is not as bad as with the others as it is "only" an 
optimization hint, but it makes sense to make it available.


Note that when you place the 'module' in the same file as the module 
users ('use'), the compiler might know things because they are in the 
same translation unit / file not because it is in the module ...



  Chung-Lin, please check, via test cases.
'gfortran.dg/goacc/routine-module*', for example, should provide some
guidance of how to achieve actual module file use, and then do the same
'scan-tree-dump' as in the current 'readonly' modifier test cases.

...

By means of only emitting a tag
in the module file if the 'readonly' modifier is specified, we should
maintain compatibility with the current 'MOD_VERSION'.


That was the idea: If only new information gets added (if used), older 
compilers still work. This has huge limitations and does not work as 
well as imagined but here it should work: Older .mod will work with new 
compilers, even though the reverse might not be true.


Tobias


Re: OpenACC 2.7: front-end support for readonly modifier: Add basic OpenACC 'declare' testing

2024-03-14 Thread Tobias Burnus

Hi all, hi Thomas & Chung-Lin,

Thomas Schwinge wrote:

But I realized another thing: don't we have to handle the 'readonly'
modifier also in Fortran module files, that is, next to the OpenACC
'declare' 'copyin' handling in 'gcc/fortran/module.cc':
'AB_OACC_DECLARE_COPYIN' etc.?


I bet so; it is not as bad as with the others as it is "only" an
optimization hint, but it makes sense to make it available.

Note that when you place the 'module' in the same file as the module
users ('use'), the compiler might know things because they are in the
same translation unit / file not because it is in the module ...


  Chung-Lin, please check, via test cases.
'gfortran.dg/goacc/routine-module*', for example, should provide some
guidance of how to achieve actual module file use, and then do the same
'scan-tree-dump' as in the current 'readonly' modifier test cases.

...

By means of only emitting a tag
in the module file if the 'readonly' modifier is specified, we should
maintain compatibility with the current 'MOD_VERSION'.


That was the idea: If only new information gets added (if used), older
compilers still work. This has huge limitations and does not work as
well as imagined but here it should work: Older .mod will work with new
compilers, even though the reverse might not be true.

Tobias


Re: [PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls

2024-03-14 Thread Tobias Burnus

Hi Kwok,

On January 22, 2024, Kwok Cheung Yeung wrote:
There was a bug in the declare-target-indirect-2.c libgomp testcase 
(testing indirect calls in offloaded target regions, spread over 
multiple teams/threads) that due to an errant fallthrough in a switch 
statement resulted in only one indirect function ever getting called:


(When applying, also the 'dg-xfail-run-if' needs to be removed from
libgomp.fortran/declare-target-indirect-2.f90) ...

However, when the missing break statements are added, the testcase 
fails with an invalid memory access. Upon investigation, this is due 
to the use of a splay-tree as the lookup structure for indirect 
addresses, as the splay-tree moves frequently accessed elements closer 
to the root node and so needs locking when used from multiple threads. 
However, this would end up partially serialising all the threads and 
kill performance. I have switched the lookup structure from a splay 
tree to a hashtab instead to avoid locking during lookup.


I have also tidied up the initialisation of the lookup table by 
calling it only from the first thread of the first team, instead of 
redundantly calling it from every thread and only having the first one 
reached do the initialisation. This removes the need for locking 
during initialisation.


LGTM - except of the following, which we need to solve
(as suggested or differently (locking, or ...) or
by declaring it a nonissue (e.g. because of thinko of mine).

Thoughts about the following?

* * *

Namely, I wonder whether there will be an issue for

#pragma target nowait
   ...
#pragma target
   ...

Once the kernel is started, thegcn_expand_prologue creates some setup code and then a call to 
gomp_gcn_enter_kernel. Likewise for gcc/config/nvptx/nvptx.cc, where 
nvptx_declare_function_name adds via write_omp_entry a call to 
gomp_nvptx_main. And one of the first tasks there is 'build_indirect_map'. Assume a very simple kernel for the second item (i.e. it is quickly started)

and a very large number of reverse kernels.

Now, I wonder whether it is possible to have a race between the two kernels;
it seems as if that might happen but is extremely unlikely accounting for all
the overhead of launching and the rather small list of reverse offload items.

As it is unlikely, I wonder whether doing the following lock free, opportunistic
approach will be the best solution. Namely, assuming that no other kernel 
updates
the hash, but if that happens by chance, use the one that was created first.
(If we are lucky, the atomic overhead is fully cancelled by using a local
variable in the function but neither should matter much.)

if (!indirect_htab) // or: __atomic_load_n (_htab, __ATOMIC_RELAXED) ?
{
  htab_t local_indirect_htab = htab_create (num_ind_funcs);
  ...
  htab_t expected = NULL;
  __atomic_compare_exchange_n (_htab, ,
   local_indirect_htab, false, ...);
  if (expected) // Other kernel was faster, drop our version
htab_free (local_indirect_htab);
}

On January 29, 2024, Kwok Cheung Yeung wrote:
Can you please akso update the comments to talk about hashtab instead 
of splay?
This version has the comments updated and removes a stray 'volatile' 
in the #ifdefed out code.

Thanks,

Tobias



Re: [PATCH, OpenACC 2.7] struct/array reductions for Fortran

2024-03-13 Thread Tobias Burnus

Hi Chung-Lin, hi Thomas, hello world,

some thoughts glancing at the patch.

Chung-Lin Tang wrote:

There is still some shortcomings in the current state, mainly that only explicit-shaped 
arrays can be used (like its C counterpart). Anything else is currently a bit more 
complicated in the middle-end, since the existing reduction code creates an 
"init-op" (literal of initial values) which can't be done when say 
TYPE_MAX_VALUE (TYPE_DOMAIN (array_type)) is not a tree constant. I think we'll be on the 
hook to solve this later, but I think the current state is okay to submit.


I think having some initial support is fine, but it needs an 
understandable and somewhat complete error diagnostic and testcases. 
More to this below.



+  if (!TREE_CONSTANT (min_tree) || !TREE_CONSTANT (max_tree))
+   {
+ error_at (loc, "array in reduction must be of constant size");
+ return error_mark_node;
+   }

Shouldn't this use a sorry_at instead?


+ /* OpenACC current only supports array reductions on explicit-shape
+arrays.  */
+ if ((n->sym->as && n->sym->as->type != AS_EXPLICIT)
+ || n->sym->attr.codimension)
gfc_error ("Array %qs is not permitted in reduction at %L",
   n->sym->name, >where);
[Coarray excursion. I am in favor of allowing it for the reasons above, 
but it could be also rejected but I would prefer to have a proper error 
message in that case.]


While coarrays are unspecified, I do not see a reason why a corray 
shouldn't be permitted here – as long as it is not coindexed. At the 
end, it is just a normal array with some additional properties, which 
make it possible to remotely access it.


Note: For coarray scalars, we have 'sym->as', thus the check should be 
'(n->sym->as && n->sym->as->rank)' to permit scalar coarrays.


* * *

Coarray excursion: A coarray variables exists in multiple processes 
("images", e.g. MPI processes). If 'caf' and 'caf2' are coarrays, then 
'caf = 5' and 'i = caf2' refer to the local variable.


On the other hand, 'caf[n] = 5' or 'i = caf[3,m]' refers to the 'caf' 
variable on image 'n' or [3,m]', respectively, which implies in general 
some function call to read or set the remote data, unless the memory is 
directly accessible (→ e.g. some offset calculation) and the compiler 
already knows how to handle this.


While a coarrary might be allocated in some special memory, as long as 
one uses the local version (i.e. not coindexed / without the image index 
in brackets).


Assume for the example above, e.g., integer :: caf[*], caf2[3:6, 7:*].

* * *

Thus, in terms of OpenACC or OpenMP, there is no reason to fret a 
coarray as long as it is not coindexed and as long as OpenMP/OpenACC 
does not interfere with the memory allocation – either directly ('!$omp 
allocators') or indirectly by placing it into special memory (pinned, 
pseudo-unified-shared memory → OG13's -foffload-memory=pinned/unified).


In the meanwhile, OpenMP actually explicitly allows coarrays with few 
exceptions while OpenACC talks about unspecified behavior.


* * *

Back to generic comments:

If I look at the existing code, I see at gfc_match_omp_clause_reduction:


 if (gfc_match_omp_variable_list (" :", >lists[list_idx], false, NULL,
  , openacc, allow_derived) != 
MATCH_YES)


If 'openacc' is true, array sections are permitted - but the code added 
(see quote above) does not handle n->expr at all and only n->sym.


I think there needs to be at least a "gfc_error ("Sorry, subarrays/array 
sections not yet handled" [subarray is the OpenACC wording, 'array 
section' is the Fortran one, which might be clearer.


But you could consider to handle at least array elements, i.e. 
n->expr->rank == 0.


Additionally, I think the current error message is completely unhelpful 
given that some arrays are supported but most are not.


I think there should be also some testcases for the not-yet-supported 
case. I think the following will trigger the omp-low.cc 'sorry_at' (or 
currently 'error' - but I think it should be a sorry):


subroutine foo(n)

integer :: n, A(n)

... reduction(+:A)

And most others will trigger in openmp.cc; for those, you should have an 
allocatable/pointer and assumed-shape arrays for the diagnostic testcase 
as well.


* * *

I have not really experimented with the code, but does it handle 
multi-dimensional constant arrays like 'integer :: a(3:6,10,-1:1)' ? — I 
bet it does, at least after handling my example [2] for the C patch [1].


Thanks,

Tobias

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647704.html



Re: [PATCH, OpenACC 2.7] Implement reductions for arrays and structs

2024-03-13 Thread Tobias Burnus

Hi Chung-Lin,


https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

Chung-Lin Tang wrote:

this patch implements reductions for arrays and structs for OpenACC. Following 
the pattern for OpenACC reductions [...]


(Stumbled over while looking at the Fortran patch, but applying to 
C/C++, hence mentioned here; the Fortran patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645205.html )



OpenACC permits array elements and subarrays. I have not checked whether 
array elements are currently rejected or fully supported, but I miss a 
testcase for both array elements (unless there is one already) and array 
sections.


If implemented, I think there should be a working run-time test.
If not supported, there should be a sorry_at error for those.

Note: the parser should handle array sections as OpenMP handles them.

The testcase should cover something like the following:

void f(int n)
{
  int x[5][5]; // Multimensional array;
  int y[n]; // VLA
  int *z = (int*)malloc(5*5*sizeof(int)); // Allocated array

... reduction(+:x)
... reduction(+:y)

... reduction(+:x[0:5][2:1])  // OK
... reduction(+:x[1:4][2:1])
  // invalid - while contiguous, first dim does not span the whole array
... reduction(+:y[2:2])  // OK
... reduction(+:y[3:])  // OK - same as [3:n-3]
... reduction(+:y[:2])  // OK - same as [0:2]
... reduction(+:z[1:2][1:6])  // OK

And the same where at least one of the const number is replaced by
a variable.

Note: The 'invalid' reduction is fine in terms of being contiguous (last 
dimension contains a single element, hence, the dimension before does 
not need to span the whole extend) - but OpenACC requires the all 
dimensions but the last to span the whole range.


See "2.7.1 Data Specification in Data Clauses" for the subarray description.

I think - if known at compile time - there should be also a diagnostic 
if the any dimension but the last does not span the whole range.


Thanks,

Tobias


[committed] libgomp/libgomp.texi: Fix @node order in @menu

2024-03-12 Thread Tobias Burnus

The ordering problem was reported on #gfortran's IRC.

The warning disappears between texinfo 6.7 and 6.8  – and my bet is that 
it has been caused by the texinfo commit


fa1ee0cf35 Do not warn if external node in menu is not consistent with 
sections


which implies that it was done on purpose in texinfo. It clearly wasn't 
done on purpose in GCC, though. Hence:


Committed as obvious.

Tobias
commit ef79c64cb5762c86ee04ddfcedb7fe31eaa3bac8
Author: Tobias Burnus 
Date:   Tue Mar 12 15:42:50 2024 +0100

libgomp/libgomp.texi: Fix @node order in @menu

While texinfo 7.0.3 does not warn, an older texinfo did complain about:
libgomp.texi:1964: warning: node next `omp_target_memcpy' in menu
`omp_target_memcpy_rect' and in sectioning `omp_target_memcpy_async' differ

libgomp/

* libgomp.texi (Device Memory Routines): Swap item order to match
the order of the '@node's of the '@subsection's.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index bf5c7a76fc9..57165e0e981 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1783,8 +1783,8 @@ pointers on devices. They have C linkage and do not throw exceptions.
 * omp_target_is_present:: Check whether storage is mapped
 * omp_target_is_accessible:: Check whether memory is device accessible
 * omp_target_memcpy:: Copy data between devices
-* omp_target_memcpy_rect:: Copy a subvolume of data between devices
 * omp_target_memcpy_async:: Copy data between devices asynchronously
+* omp_target_memcpy_rect:: Copy a subvolume of data between devices
 * omp_target_memcpy_rect_async:: Copy a subvolume of data between devices asynchronously
 @c * omp_target_memset:: /TR12
 @c * omp_target_memset_async:: /TR12


Re: [Patch] OpenMP/Fortran: Fix defaultmap(none) issue with dummy procedures [PR114283]

2024-03-12 Thread Tobias Burnus

Jakub Jelinek wrote:


So firstprivate clause handling remaps them then if declare target indirect
is used? If so, the patch looks reasonable to me.


[I have now updated the patch to turn the testcase to ensure
that is also keeps works at runtime.]

OpenMP leaves it a bit open when the remapping has to happen,
but one can construct cases – in particular with unified-shared memory –
where it is not possible to do this upon entry to a target region.

Thus, it has to be done when the function is invoked, e.g.

i = (*g) ();

is turned (in the target region but only on the device side) into

i = (*GOMP_target_map_indirect_ptr (g)) ();

Thus, as long as the host pointer value is transferred to the device,
it works – as the lookup is done on the device side. Directly using a
device address (remap when mapping to the target) will also not shorten
the lookup, i.e. there is no need for it.

Does it still look reasonable to you?

Tobias

PS: The current OpenMP specification, it is listed mainly described via
the glossary (newest change is the addition of dummy procedure):

"indirect device invocation – An indirect call to the _device_ version of 
a _procedure_ on a _device_ other than the _host-device_, through a 
function pointer (C/C++), a pointer to a member function (C++), a dummy 
procedure (Fortran), or a procedure pointer (Fortran) that refers to the 
host version of the _procedure_."
OpenMP/Fortran: Fix defaultmap(none) issue with dummy procedures [PR114283]

Dummy procedures look similar to variables but aren't - neither in Fortran
nor in OpenMP. As the middle end sees PARM_DECLs, mark them as predetermined
firstprivate for mapping (as already done in gfc_omp_predetermined_sharing).

This does not address the isses related to procedure pointers, which are
still discussed on spec level [see PR].

	PR fortran/114283

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_omp_predetermined_mapping): Map dummy
	procedures as firstprivate.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-indirect-4.f90: New test.

 gcc/fortran/trans-openmp.cc|  9 +
 .../libgomp.fortran/declare-target-indirect-4.f90  | 43 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index a2bf15665b3..1dba47126ed 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -343,6 +343,15 @@ gfc_omp_predetermined_mapping (tree decl)
 	&& GFC_DECL_SAVED_DESCRIPTOR (decl)))
 return OMP_CLAUSE_DEFAULTMAP_TO;
 
+  /* Dummy procedures aren't considered variables by OpenMP, thus are
+ disallowed in OpenMP clauses.  They are represented as PARM_DECLs
+ in the middle-end, so return OMP_CLAUSE_DEFAULTMAP_FIRSTPRIVATE here
+ to avoid complaining about their uses with defaultmap(none).  */
+  if (TREE_CODE (decl) == PARM_DECL
+  && TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+  && TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) == FUNCTION_TYPE)
+return OMP_CLAUSE_DEFAULTMAP_FIRSTPRIVATE;
+
   /* These are either array or derived parameters, or vtables.  */
   if (VAR_P (decl) && TREE_READONLY (decl)
   && (TREE_STATIC (decl) || DECL_EXTERNAL (decl)))
diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-4.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-4.f90
new file mode 100644
index 000..43f4295494c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-4.f90
@@ -0,0 +1,43 @@
+! { dg-additional-options "-fdump-tree-gimple" }
+
+! PR fortran/114283
+
+! { dg-final { scan-tree-dump "#pragma omp parallel shared\\(i\\) if\\(0\\) default\\(none\\) firstprivate\\(g\\)" "gimple" } }
+! { dg-final { scan-tree-dump "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) firstprivate\\(h\\) map\\(from:j \\\[len: 4\\\]\\) defaultmap\\(none\\)" "gimple" } }
+
+
+module m
+  implicit none (type, external)
+  !$omp declare target indirect enter(f1, f2)
+contains
+  integer function f1 ()
+f1 = 99
+  end
+  integer function f2 ()
+f2 = 89
+  end
+end module m
+
+use m
+implicit none (type, external)
+call sub1(f1)
+call sub2(f2)
+contains
+  subroutine sub1(g)
+procedure(integer) :: g
+integer :: i
+!$omp parallel default(none) if(.false.) shared(i)
+  i = g ()
+!$omp end parallel
+if (i /= 99) stop 1
+  end
+
+  subroutine sub2(h)
+procedure(integer) :: h
+integer :: j
+!$omp target defaultmap(none) map(from:j)
+  j = h ()
+!$omp end target
+if (j /= 89) stop 1
+  end
+end


[Patch] OpenMP/Fortran: Fix defaultmap(none) issue with dummy procedures [PR114283]

2024-03-11 Thread Tobias Burnus

Using dummy procedures in a target region with 'defaultmap(none)' leads to:

  Error: 'g' not specified in enclosing 'target'

and this cannot be fixed by using 'firstprivate' as non-pointer dummy routines
are rejected as "Error: Object 'g' is not a variable".

Fixed by doing the same for mapping as for data sharing: using predetermined
firstprivate.

BTW: Only since GCC 14, 'declare target indirect' makes it possible to
simply use dummy procedures and procedures pointers in a target region.

Comments? Suggestions?

Tobias

PS: Procedure pointers aren't variables either, but they act even more like
variables as they permit changing pointer association such that '(first)private'
vs. 'shared'/'map' can both make sense. — GCC accepts those in (nearly) all 
clauses,
ifort only in (first)private while flang not at all. The spec is somewhat silent
about it. This is tracked in the same PR (PR114283) and in the specification
issue #3823.
OpenMP/Fortran: Fix defaultmap(none) issue with dummy procedures [PR114283]

Dummy procedures look similar to variables but aren't - neither in Fortran
nor in OpenMP. As the middle end sees PARM_DECLs, mark them as predetermined
firstprivate for mapping (as already done in gfc_omp_predetermined_sharing).

This does not address the isses related to procedure pointers, which are
still discussed on spec level [see PR].

	PR fortran/114283

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_omp_predetermined_mapping): Map dummy
	procedures as firstprivate.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/target4.f90: New test.

 gcc/fortran/trans-openmp.cc|  9 +
 gcc/testsuite/gfortran.dg/gomp/target4.f90 | 18 ++
 2 files changed, 27 insertions(+)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index a2bf15665b3..1dba47126ed 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -343,6 +343,15 @@ gfc_omp_predetermined_mapping (tree decl)
 	&& GFC_DECL_SAVED_DESCRIPTOR (decl)))
 return OMP_CLAUSE_DEFAULTMAP_TO;
 
+  /* Dummy procedures aren't considered variables by OpenMP, thus are
+ disallowed in OpenMP clauses.  They are represented as PARM_DECLs
+ in the middle-end, so return OMP_CLAUSE_DEFAULTMAP_FIRSTPRIVATE here
+ to avoid complaining about their uses with defaultmap(none).  */
+  if (TREE_CODE (decl) == PARM_DECL
+  && TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+  && TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) == FUNCTION_TYPE)
+return OMP_CLAUSE_DEFAULTMAP_FIRSTPRIVATE;
+
   /* These are either array or derived parameters, or vtables.  */
   if (VAR_P (decl) && TREE_READONLY (decl)
   && (TREE_STATIC (decl) || DECL_EXTERNAL (decl)))
diff --git a/gcc/testsuite/gfortran.dg/gomp/target4.f90 b/gcc/testsuite/gfortran.dg/gomp/target4.f90
new file mode 100644
index 000..09364e707f1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/target4.f90
@@ -0,0 +1,18 @@
+! { dg-additional-options "-fdump-tree-gimple" }
+
+! PR fortran/114283
+
+! { dg-final { scan-tree-dump "#pragma omp parallel default\\(none\\) firstprivate\\(g\\)" "gimple" } }
+! { dg-final { scan-tree-dump "#pragma omp target num_teams\\(-2\\) thread_limit\\(0\\) defaultmap\\(none\\) firstprivate\\(g\\)" "gimple" } }
+
+subroutine f(g)
+procedure() :: g
+
+!$omp parallel default(none)
+  call g
+!$omp end parallel
+
+!$omp target defaultmap(none)
+  call g
+!$omp end target
+end


Re: Fix 'char' initialization, copy, check in 'libgomp.oacc-fortran/acc-memcpy.f90'

2024-03-08 Thread Tobias Burnus

Hi Thomas,

Am 08.03.24 um 12:15 schrieb Thomas Schwinge:

OK to push
"Fix 'char' initialization, copy, check in 
'libgomp.oacc-fortran/acc-memcpy.f90'",
see attached?


OK.

I think there was some remaining code around the problem that 
HUGE(1_int8) = 127 and '-128_int8' is invalid because in Fortran, that's 
'unary_minus(128_int8)', which is not valid as 128 exceeds HUGE(1_int8),


Which the remaining bits code tried to solve (i.e. -127:127 vs. 
-128:127) but seemingly failed to do so consistently.


Thanks!

Tobias


Re: nvptx: 'cuDeviceGetCount' failure is fatal

2024-03-07 Thread Tobias Burnus

Hi Thomas,

Thomas Schwinge wrote:
/* Return the number of GCN devices on the system. */  
  int

-GOMP_OFFLOAD_get_num_devices (void)
+GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
  {
if (!init_hsa_context ())
  return 0;
+  /* Return -1 if no omp_requires_mask cannot be fulfilled but
+ devices were present.  */
+  if (hsa_context.agent_count > 0 && omp_requires_mask != 0)
+return -1;
return hsa_context.agent_count;
  }

...

OK to push the attached "nvptx: 'cuDeviceGetCount' failure is fatal"?


I think the real question is: what does a 'cuDeviceGetCount' fail mean?

Does it mean a serious error – or could it just be a permissions issue 
such that the user has no device access but otherwise is fine?


Because if it is, e.g., a permission problem – just returning '0' (no 
devices) would seem to be the proper solution.


But if it is expected to be always something serious, well, then a fatal 
error makes more sense.


The possible exit codes are:

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, 
CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE


which does not really help.

My impression is that 0 is usually returned if something goes wrong 
(e.g. with permissions) such that an error is a real exception. But all 
three choices seem to make about equally sense: either host fallback 
(with 0 or -1) or a fatal error.


Tobias


Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Tobias Burnus

Hi Thomas,

first, I have the feeling we talk about (more or less) the same code 
region and use the same words – but we talk about rather different 
things. Thus, you confuse me (and possibly Andrew) – and my reply 
confuses you.


Thomas Schwinge wrote:

On 2024-03-07T12:43:07+0100, Tobias Burnus  wrote:

Thomas Schwinge wrote:

First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it
is also not really desirable.

External users probably don't, but certainly all our internal testing is
setting it,


First, I doubt it – secondly, if it were true, it was broken for the 
last 5 years or so as we definitely did not notice fails due to not 
working offload devices. – Neither for AMD GCN nor ...



and also implicitly all nvptx offloading testing: simply by
means of having such knob in the libgomp nvptx plugin.


I did see it at some places set for AMD but I do not see any 
nvptx-specific environment variable which permits to do the same.


However:

  That is, the
libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for
(the original meaning of) that flag


I think that's one of the problems here – you talk about 
suppress_host_fallback (implicit, original meaning), while I talk about 
the GCN_SUPPRESS_HOST_FALLBACK environment variable.


Besides all the talk about suppress_host_fallback, 
'init_hsa_runtime_functions' is not fatal' of the subject line seems to 
be something to be considered (beyond the patches you already suggested).




If I run on my Linux system the system compiler with nvptx + gcn suppost
installed, I get (with a nvptx permission problem):

$ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out

libgomp: GCN host fallback has been suppressed

And exit code = 1. The same result with '-foffload=disable' or with
'-foffload=nvptx-none'.

I can't tell if that's what you expect to see there, or not?


Well, obviously not that I get this error by default – and as your 
wording indicated that the internal variable will be always true – and 
not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I 
worry that I would get the error any time.



(For avoidance of doubt: I'm expecting silent host-fallback execution in
case that libgomp GCN and/or nvptx plugins are available, but no
corresponding devices.  That's what my patch achieves.)


I concur that the silent host fallback should happen by default (unless 
env vars tell otherwise) - at least when either no code was generated 
for the device (e.g. -foffload=disable) or when the vendor runtime 
library is not available or no device (be it no hardware or no permission).


That's the current behavior and if that remains, my main concern evaporates.

* * *


If we want to remove it, we can make it always false - but I am strongly
against making it always true.

I'm confused.  So you want the GCN and nvptx plugins to behave
differently in that regard?

No – or at least: not unless GCN_SUPPRESS_HOST_FALLBACK is set.

Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to
prevent the host fallback, but don't break somewhat common systems.

That's an orthogonal concept?


No – It's the same concept of the main use of the 
GCN_SUPPRESS_HOST_FALLBACK environment variable: You get a run-time 
error instead of a silent host fallback.


But I have in the whole thread the feeling that – while talking about 
the same code region and throwing in the same words – we actually talk 
about completely different things.


Tobias


Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Tobias Burnus

Hi,

Thomas Schwinge wrote:

An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is
different from the libgomp-level host-fallback execution):

+failure:
+  if (suppress_host_fallback)
+GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
+  return false;
+}


This originates in the libgomp HSA plugin, where the idea was -- in my
understanding -- that you wouldn't have device code available for all
'fn_ptr's, and in that case transparently (shared-memory system!) do
host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin (see above).
However, is it really still applicable there; don't we assume that we're
generating device code for all relevant functions?  (I suppose everyone
really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?)


First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it 
is also not really desirable.


If I run on my Linux system the system compiler with nvptx + gcn suppost 
installed, I get (with a nvptx permission problem):


$ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out

libgomp: GCN host fallback has been suppressed

And exit code = 1. The same result with '-foffload=disable' or with 
'-foffload=nvptx-none'.



Should we thus
actually remove 'suppress_host_fallback' (that is, make it
always-'true'),


If we want to remove it, we can make it always false - but I am strongly 
against making it always true.


Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to 
prevent the host fallback, but don't break somewhat common systems.


Tobias


[wwwdocs, committed] projects/gomp/: Fix typo, mark an item as implemented in GCC 14

2024-03-07 Thread Tobias Burnus

Found when glancing at it: A typo and an omission.
Committed. Seehttps://gcc.gnu.org/projects/gomp/#omp5.2  for the result.

Tobias
commit f99d0f3a2c61ad6677170b9068d511c20ba1bfe1
Author: Tobias Burnus 
Date:   Thu Mar 7 11:40:57 2024 +0100

projects/gomp/: Fix typo, mark an item as implemented in GCC 14

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 8fdfb95a..b8f11508 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -708,7 +708,7 @@ than listed, depending on resolved corner cases and optimizations.
 
   
   
-.terators in target update motion clauses and map clauses
+Iterators in target update motion clauses and map clauses
 No
 
   
@@ -729,7 +729,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 present argument to defaultmap clause
-No
+GCC14
 
   
   


Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection

2024-03-04 Thread Tobias Burnus

Hi Sandra,

Sandra Loosemore wrote:

On 3/1/24 08:23, Tobias Burnus wrote:

Maybe the proposed wording will help others to avoid this pitfall.
(Or is this superfluous as -foffload= is not much used and, even if,
no one then remembers or finds this none?)


Well, I spent a long time looking at this, and my only conclusion is 
that I don't really understand what the problem you're trying to solve 
is.  If it's problematical to have the runtime know about offload 
devices the compiled code isn't using, don't users also need to know 
how to restrict the runtime to a particular set of devices the same 
way -foffload= lets you do, and not just how to disable offloading in 
the runtime entirely?
It's pretty clearly documented already how -foffload affects the 
compiler's behavior, and the library's behavior is already documented 
in its own manual.  Maybe what we don't have is a tutorial on how to 
build/link/run programs using a specific offload device, or on the host?


The problem is for code like the following, which is perfectly valid
and works

(A) If you don't have any offload device
(independent of the compiler options)

(B) If you have an offload device (supported by your libgomp)
and compiled with offloading support (for that device)

But (C) if you have an offload device and compile as:
  gcc -fopenmp -foffload=disabled

it will fail at runtime with:

dev = 0 / num devs = 1 Segmentation fault (core dumped) The problem is 
that there is a mismatch between the code (assumes no offload code + 
always host fallback) and the run-time library (which detects offload 
devices), such that the API routines uses a different device than the 
'target' code:


#include 
#include 

#define N 2064
int
main ()
{
  int *x = (int*) omp_target_alloc (sizeof(int)*N,
omp_get_default_device ());
  printf ("dev = %d / num devs = %d\n",
  omp_get_default_device (), omp_get_num_devices ());
  #pragma omp target is_device_ptr(x)
  for (int i = 0; i < N; ++i)
x[i] = i;
}
---

On the technical side, it is not really surprising but it
might be still be confusing for the user. Obviously, it can
also occur if you compile, e.g., for AMD GCN and only an
Nvidia device is available - but there the solution would be
the same (disable all devices).

(OpenMP 6.0 will provide a environment variable that allows
fine tuning of the available devices.)


Questions:

* Is such a usage common enough to matter?
I guess for some benchmark use it make – to test whether
real offloading or host fallback is faster + if the latter
is true, it might also get used in operational code.

* Are API routines used in such a code in a way that it breaks?
(Unfortunately not very unlikely in larger code.)

If there is enough real-world usage (= 2x yes to the questions above):
* How to word is to help users and not to confuse them?

Tobias


Re: [Patch] invoke.texi: Add note that -foffload= does not affect device detection

2024-03-04 Thread Tobias Burnus

Hi,

Sandra Loosemore wrote:

On 3/1/24 17:29, Sandra Loosemore wrote:

On 3/1/24 08:23, Tobias Burnus wrote:
Aside: Shouldn't all the HTML documents start with a  and 
 before

the table of content? Currently, it has:
   Top (GNU libgomp)
and the body starts with
   Short Table of Contents


I note that the 'Top(...)' in  already appears in the GCC 8.5 
docs (created with Texinfo 6.5; while GCC 7.5, created with texinfo 6.3, 
is okay). And the  disappears in the GCC 10.5 doc, created with 
Texinfo 7.0dev.


I have no idea why the 'Top(...)' appears with Texinfo 6.5, but the 
missing  is because of Texinfo 7.0, cf. 
https://git.savannah.gnu.org/cgit/texinfo.git/plain/NEWS


I think it would be useful to remove the 'Top()' in  and add the 
 in general.


For the GCC website, we might want to set TOP_NODE_UP_URL.

I think this is a bug in the version of texinfo used to produce the 
HTML content for the GCC web site.  Looking at a recent build of my 
own using Texinfo 6.7, I do see



GNU libgomp

The manual on the web site says it was produced by "GNU Texinfo 7.0dev".


I poked at this a little and apparently you need to fiddle with the 
SHOW_TITLE or NO_TOP_NODE_OUTPUT customization variables in recent 
versions of Texinfo in order to get the document title to show up in 
HTML output.


https://www.gnu.org/software/texinfo/manual/texinfo/texinfo.html#index-SHOW_005fTITLE 



Probably this has to be controlled by a configure check since older 
Texinfo versions may barf on unknown options.

...
I'd think that if we were going to do that, we'd also want to use an 
official release version of Texinfo instead of a "dev" snapshot.


(I concur that we should update 7.0dev to 7.0.3 or 7.1 on the server to 
have a defined version.)


Thanks,

Tobias



Re: [Patch] OpenMP/C++: Fix (first)private clause with member variables [PR110347]

2024-03-01 Thread Tobias Burnus

Jakub Jelinek wrote:

As discussed on IRC, I believe not disregarding the capture proxies in
target regions if they shouldn't be shared is always wrong, but also the
gimplify.cc suggestion was incorrect.

The thing is that at the place where the omp_disregard_value_expr call
is done currently for target region flags is always in_code ? GOVD_SEEN : 0
so by testing flags & anything we actually don't differentiate between
privatized vars and mapped vars.  So, it needs to be moved after we
actually compute the flags, similarly how we do it for non-target.

...

I have now added Jakub's updated the gimplify.cc patch, renamed the test 
files, added the proposed lambda test case as well, did add a missing 
line break, and updated the target-lambda-1.C to also work with shared 
memory.


I think the patch should be good, having testing it with offloading here 
and Jakub also testing it on his side.


Final comments, suggestions, remarks?

Tobias
OpenMP/C++: Fix (first)private clause with member variables [PR110347]

OpenMP permits '(first)private' for C++ member variables, which GCC handles
by tagging those by DECL_OMP_PRIVATIZED_MEMBER, adding a temporary VAR_DECL
and DECL_VALUE_EXPR pointing to the 'this->member_var' in the C++ front end.

The idea is that in omp-low.cc, the DECL_VALUE_EXPR is used before the
region (for 'firstprivate'; ignored for 'private') while in the region,
the DECL itself is used.

In gimplify, the value expansion is suppressed and deferred if the
  lang_hooks.decls.omp_disregard_value_expr (decl, shared)
returns true - which is never the case if 'shared' is true. In OpenMP 4.5,
only 'map' and 'use_device_ptr' was permitted for the 'target' directive.
And when OpenMP 5.0's 'private'/'firstprivate' clauses was added, the
the update that now 'shared' argument could be false was missed. The
respective check has now been added.

2024-03-01  Jakub Jelinek  
	    Tobias Burnus  

	PR c++/110347

gcc/ChangeLog:

	* gimplify.cc (omp_notice_variable): Fix 'shared' arg to
	lang_hooks.decls.omp_disregard_value_expr for
	(first)private in target regions.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/target-lambda-3.C: Moved from
	gcc/testsuite/g++.dg/gomp/ and fixed is-mapped handling.
	* testsuite/libgomp.c++/target-lambda-1.C: Modify to also
	also work without offloading.
	* testsuite/libgomp.c++/firstprivate-1.C: New test.
	* testsuite/libgomp.c++/firstprivate-2.C: New test.
	* testsuite/libgomp.c++/private-1.C: New test.
	* testsuite/libgomp.c++/private-2.C: New test.
	* testsuite/libgomp.c++/target-lambda-4.C: New test.
	* testsuite/libgomp.c++/use_device_ptr-1.C: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/target-lambda-1.C: Moved to become a
	run-time test under testsuite/libgomp.c++.

Co-authored-by: Tobias Burnus 

 gcc/gimplify.cc  |  20 +-
 gcc/testsuite/g++.dg/gomp/target-lambda-1.C  |  94 ---
 libgomp/testsuite/libgomp.c++/firstprivate-1.C   | 305 +++
 libgomp/testsuite/libgomp.c++/firstprivate-2.C   | 125 ++
 libgomp/testsuite/libgomp.c++/private-1.C| 247 ++
 libgomp/testsuite/libgomp.c++/private-2.C| 117 +
 libgomp/testsuite/libgomp.c++/target-lambda-1.C  |  15 +-
 libgomp/testsuite/libgomp.c++/target-lambda-3.C  | 104 
 libgomp/testsuite/libgomp.c++/target-lambda-4.C  |  41 +++
 libgomp/testsuite/libgomp.c++/use_device_ptr-1.C | 126 ++
 10 files changed, 1089 insertions(+), 105 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 7f79b3cc7e6..6ebca964cb2 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -8144,13 +8144,6 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
   n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
   if ((ctx->region_type & ORT_TARGET) != 0)
 {
-  if (ctx->region_type & ORT_ACC)
-	/* For OpenACC, as remarked above, defer expansion.  */
-	shared = false;
-  else
-	shared = true;
-
-  ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);
   if (n == NULL)
 	{
 	  unsigned nflags = flags;
@@ -8275,9 +8268,22 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
 	}
 	found_outer:
 	  omp_add_variable (ctx, decl, nflags);
+	  if (ctx->region_type & ORT_ACC)
+	/* For OpenACC, as remarked above, defer expansion.  */
+	shared = false;
+	  else
+	shared = (nflags & (GOVD_PRIVATE | GOVD_FIRSTPRIVATE)) == 0;
+	  ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);
 	}
   else
 	{
+	  if (ctx->region_type & ORT_ACC)
+	/* For OpenACC, as remarked above, defer expansion.  */
+	shared = false;
+	  else
+	shared = ((n->value | flags)
+		  & (GOVD_PRIVATE | GOVD_FIRSTPRIVATE)) == 0;
+	  ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);
 	  /* If nothing changed, there's nothing left to do.  */
 	  if ((n->value & flags

[Patch] invoke.texi: Add note that -foffload= does not affect device detection

2024-03-01 Thread Tobias Burnus

Not very often, but do I keep running into issues (fails, segfaults)
related to testing programs compiled with a GCC without offload
configured and then using the system libraries. - That's equivalent
to having the system compiler (or any offload compiler) and
compiling with -foffload=disable.

The problem is that while the program only contains host code,
the run-time library still initializes devices when an API
routine - such as omp_get_num_devices - is invoked. This can
lead to odd bugs as target regions, obviously, will use host
fallback (for any device number) but the API routines will
happily operate on the actual devices, which can lead to odd
errors.

(Likewise issue when compiling for one offload target type
and running on a system which has devices of an other type.)

I assume that that's not a very common problem, but it can be
rather confusing when hitting this issue.

Maybe the proposed wording will help others to avoid this pitfall.
(Or is this superfluous as -foffload= is not much used and, even if,
no one then remembers or finds this none?)

Thoughts?

* * *

It was not clear to me how to refer to libgomp.texi
- Should it be 'libgomp' as in 'info libgomp' or the URL
  https://gcc.gnu.org/onlinedocs/libgomp/ (or filename of the PDF) implies?
- Or as  'GNU Offloading and Multi Processing Runtime Library Manual'
  as named linked to at https://gcc.gnu.org/onlinedocs or on the title page
  of the the PDF - but that name is not repeated in the info file or the HTML
  file.
- Or even 'GNU libgomp' to mirror a substring in the  of the HTML file.
I now ended up only implicitly referring that document.

Aside: Shouldn't all the HTML documents start with a  and  before
the table of content? Currently, it has:
  Top (GNU libgomp)
and the body starts with
  Short Table of Contents

Tobias

PS: In the testsuite, it mostly happens when iterating over
omp_get_num_devices() or when mixing calls to API routines with
device code ('omp target', compute constructs).
invoke.texi: Add note that -foffload= does not affect device detection

gcc/ChangeLog:

	* doc/invoke.texi (-foffload): Add note that the flag does not
	affect whether offload devices are detected.

 gcc/doc/invoke.texi | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dc5fd863ca4..4153863020b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2736,38 +2736,45 @@ targets using ms-abi.
 
 @opindex foffload
 @cindex Offloading targets
 @cindex OpenACC offloading targets
 @cindex OpenMP offloading targets
 @item -foffload=disable
 @itemx -foffload=default
 @itemx -foffload=@var{target-list}
 Specify for which OpenMP and OpenACC offload targets code should be generated.
 The default behavior, equivalent to @option{-foffload=default}, is to generate
 code for all supported offload targets.  The @option{-foffload=disable} form
 generates code only for the host fallback, while
 @option{-foffload=@var{target-list}} generates code only for the specified
 comma-separated list of offload targets.
 
 Offload targets are specified in GCC's internal target-triplet format. You can
 run the compiler with @option{-v} to show the list of configured offload targets
 under @code{OFFLOAD_TARGET_NAMES}.
 
+Note that this option does not affect the available offload devices detected by
+the run-time library and, hence, the values returned by the OpenMP/OpenACC API
+routines or access to devices using those routines.  The run-time library
+itself can be tuned using environment variables; in particular, to fully disable
+the device detection, set the @code{OMP_TARGET_OFFLOAD} environment variable to
+@code{disabled}.
+
 @opindex foffload-options
 @cindex Offloading options
 @cindex OpenACC offloading options
 @cindex OpenMP offloading options
 @item -foffload-options=@var{options}
 @itemx -foffload-options=@var{target-triplet-list}=@var{options}
 
 With @option{-foffload-options=@var{options}}, GCC passes the specified
 @var{options} to the compilers for all enabled offloading targets.  You can
 specify options that apply only to a specific target or targets by using
 the @option{-foffload-options=@var{target-list}=@var{options}} form.  The
 @var{target-list} is a comma-separated list in the same format as for the
 @option{-foffload=} option.
 
 Typical command lines are
 
 @smallexample
 -foffload-options='-fno-math-errno -ffinite-math-only' -foffload-options=nvptx-none=-latomic
 -foffload-options=amdgcn-amdhsa=-march=gfx906


Re: [Patch] OpenACC: Update libgomp.texi + openacc{.f90,_lib.h} for 3.1 arg-name changes

2024-03-01 Thread Tobias Burnus

Hi Thomas,


Thomas Schwinge wrote:

On 2024-02-27T20:11:30+0100, Tobias Burnus  wrote:

The attached patch updates the manual to match OpenACC 3.3
specification for the implemented routines.

But not update references to OpenACC 3.3, too?


As the change is not really visible (except when using Fortran 
keywords), it was not really clear to me whether the reference should be 
either changed to *or* augmented by the OpenACC 3.1 *or* 3.3 
specification reference.


What do you prefer? 3.1 or 3.3, in addition or instead of the existing 
2.x (?) references?



The questions is whether we want to do this now, or once we actually
support 3.1 or 3.3; what was your intention for preparing this now?


Fallout of some bug fixes I intended to to in the .texi file, which in 
turn was a fallout of the trivial addition of the 3.3 interfaces for 
Fortran. Well, then I realized that 3.1 changed the argument names as well.


I think we should at least do the .texi bug fixes. Additionally, those 
'type, dimension(:[,:]...)' look very odd – thus, I would be inclined to 
do those as well.


Otherwise, it is more the question when to break the keyword= API; 
fortunately, it is not an ABI issue as the compiler just uses it to 
reorder the arguments back to the original declaration.



NOTE: Those argument names *do* have an effect and can be a breaking
change as Fortran permits using the arg name in the call, e.g.,
call acc_copyin(a=myVar)  ! old
must now be called either as
call acc_copyin(data_arg=myVar)  ! new
or as
call acc_copyin(myVar)  ! works with old and new names
As the latter is way more common, the spec change hopefully does not
break too many programs.

I wonder: would it happen to be possible via "Fortran interface magic" to
actually support several variants of named arguments?  I agree we can
drop any bogus GCC-local variants, but is it possible to support all the
official variants?


Obviously not as the default (Fortran + real world) is to use no 
keywords – and then the two variants become ambiguous. Therefore, 
Fortran doesn't permit to combine two specific functions that only 
differ in this aspect.


If a real-world program uses the keywords by ill chance, it still had 
the very same problem depending on the compiler version and vendor as 
that's an upstream spec change.


The simple solution on the program side is just to drop the keyword – 
then it will work with either variant.


I think only very programs are affected – possibly even none. And I 
wonder how other compilers handle this, given that they also started 
implementing (selected) OpenACC 2.7 and 3.x features (including 3.3, as 
real-world programs proof).



And, finally, it synced over all named constants from openacc.f90 to
config/accel/openacc.f90.

I don't think that's necessary: as I understand, that one's for
'acc_on_device' only?


I think you are right — unless 'f951' is run on the device side, which 
won't happen for offloading, the accelerator version of the module file 
is not read – only the host version. The named constants will be 
expanded early to their numeric value and only the procedure calls 
remain. — Of those, only 'acc_on_device' has to be available on the 
device side and — hence, it is used at lto/link time by the device-side 
of the linker (by linking libgomp.a).


Thus, I withdraw this change as not being required, not harming, but 
wasting some GCC-build-time (only) file storage size and CPU cycles.


Tobias


Re: [committed] Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

2024-03-01 Thread Tobias Burnus

Hi all, hi John & Thomas

John David Anglin wrote:

On 2024-02-29 6:02 p.m., Thomas Schwinge wrote:

I wonder: shouldn't that cap at 50 threads happen inside libgomp,
generally, instead of per test case and user code (!)?



Per my
understanding, OpenMP 'num_threads' specifies a *desired* number of
threads; the implementation may limit that value.

Sounds like a good suggestion.


I concur – if the hardware/OS doesn't support more.

* * *

However – for completeness and to correct a statement: While num_threads 
specifies the desired number of threads, 'strict' will turn this into 
error termination if the implementation cannot fulfilled the request.


Namely, "if prescriptiveness is specified as 'strict' and Algorithm 11.1 
would result in a number of threads other than the value of the first 
item of the _nthreads_ list then runtime error termination is performed."


Note that 'strict' for num_threads is new in/since the OpenMP 6.0 draft 
(TR11, I think) and not yet implemented in GCC.


However, I guess that the thread limit also affects 'teams' and nested 
parallelization. And for teams 'num_teams(n)' sets lower = upper value 
to 'n' — Thus, this enforces this number of teams. (While 
'num_teams(m:n)' sets both limits and 'omp_set_num_teams(n)' or 
OMP_NUM_TEAMS=n only set the upper bound).


[As far as I can see, OpenACC always permits an implementation to use 
fewer gangs/workers/vectors if the hardware doesn't support the 
requested number.]


Tobias



Re: [PATCH] lto, Darwin: Fix offload section names.

2024-02-29 Thread Tobias Burnus

Hi Iain, hello world,

Thomas Schwinge wrote:

On 2024-01-16T15:00:16+, Iain Sandoe  wrote:

...

diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
index a743deb4efb..1cdadf36ec0 100644
--- a/gcc/lto-section-names.h
+++ b/gcc/lto-section-names.h

...

@@ -35,8 +39,14 @@ extern const char *section_name_prefix;
  
  #define LTO_SEGMENT_NAME "__GNU_LTO"
  
+#if OBJECT_FORMAT_MACHO

+#define OFFLOAD_VAR_TABLE_SECTION_NAME "__GNU_OFFLOAD,__vars"
+#define OFFLOAD_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__funcs"
+#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__ind_fns"
+#else
  #define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"

...

Just to note that, per my understanding, this will require corresponding
changes elsewhere, once you attempt to actually enable offloading
compilation for Darwin (which -- ;-) I suspect -- is not on your agenda
right now):


For instance also in MOLD:

https://github.com/rui314/mold/blob/50bdf39ba57e29386de28bd0c303035e626fa29c/elf/input-files.cc#L244

if ((shdr.sh_flags & SHF_EXCLUDE) &&
name.starts_with(".gnu.offload_lto_.symtab.")) {
  this->is_gcc_offload_obj = true;
  continue;
}

Tobias


[wwwdocs] gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update

2024-02-27 Thread Tobias Burnus

Minor update for older and more recent changes.

Comments?

Tobias
gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update

Update OpenMP for two meanwhile implemented features (lvalue-expr in map,
indirect now also in Fortran).
Update OpenACC for one new feature (Fortran interface to exisiting
C/C++ routines).

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 85ccc54d..1c2059b6 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -79,7 +79,8 @@ a work-in-progress.
 
 OpenMP 5.0: The allocate directive is now
   supported for stack variables in C and Fortran, including the OpenMP 5.1
-  align modifier. For Fortran, OpenMP allocators can now be
+  align modifier. In C and C++, the map clause now
+  accepts lvalue expressions. For Fortran, OpenMP allocators can now be
   used for allocatables and pointers using the allocate
   directive and its OpenMP 5.2 replacement, the allocators
   directive; files using this allocator and all files that might directly
@@ -91,8 +92,8 @@ a work-in-progress.
 
   OpenMP 5.1: Support was added for collapsing imperfectly nested loops and
   using present as map-type modifier and in
-  defaultmap. The indirect clause is now supported
-  for C and C++.  The performance of copying strided data from or to nvptx
+  defaultmap. The indirect clause is now
+  supported. The performance of copying strided data from or to nvptx
   and AMD GPU devices using the OpenMP 5.1 routine
   omp_target_memcpy_rect has been improved.
 
@@ -117,6 +118,14 @@ a work-in-progress.
 OpenACC 2.7: The self clause was added to be used on
   compute constructs and the default clause for data
   constructs.
+OpenACC 3.2: The following API routines are now available in
+  Fortran using the openacc module or the
+  open_lib.h header file: acc_alloc,
+  acc_free, acc_hostptr,
+  acc_deviceptr, acc_memcpy_to_device,
+  acc_memcpy_to_device_async,
+  acc_memcyp_from_device and
+  acc_memcyp_from_device_async.
   
   
   For offload-device code generated via OpenMP and OpenACC, the math
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index bf20bb88..8fdfb95a 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -489,7 +489,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 C/C++'s lvalue expressions in to, from and map clauses
-No
+GCC14
 
   
   
@@ -714,8 +714,8 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Indirect calls to the device version of a procedure or function in target regions
-GCC14
-Only C and C++
+GCC14
+
   
   
 interop directive
@@ -756,8 +756,8 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 indirect clause in declare target
-GCC14
-Only C and C++
+GCC14
+
   
   
 device_type(nohost)/device_type(host) for variables


Re: [patch] OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}

2024-02-27 Thread Tobias Burnus

Hi Thomas,

(Regarding 'call acc_attach(x)' – the problem is that one needs the 
address of '' and 'x'; while 'x' is readily available, for '' no 
temporary variable has to get involved – and there are plenty of ways 
temporaries can get introduced; for most cases, an interface exists that 
prevents this but they are mutually exclusive. Hence, this needs support 
in the FE. The simplest workaround for a user is to use '!$acc attach' 
instead ...)


Thomas Schwinge:

  @table @asis
  @item @emph{Description}
-This function allocates @var{len} bytes of device memory. It returns
+This function allocates @var{bytes} of device memory. It returns

Not '@var{bytes} {+bytes+}' or similar?


I think either works – depending how one parses @var{} mentally, 
one of the variants sounds smooth and the other very odd. But I can/will 
change it.



--- a/libgomp/openacc.f90
+++ b/libgomp/openacc.f90

Assuming that 'module openacc_internal' currently is sorted per
appearance in the OpenACC specification (?), I suggest we continue to do
so.  (..., like in 'openacc_lib.h', too.)
I will check – it looks only block-wise sorted but I might be wrong.I 
followed location of the comments, placing it before the routines that 
followed the comment, assuming that the comments were at the right spot.

@@ -794,6 +881,9 @@ module openacc
...
+  public :: acc_malloc, acc_free, acc_map_data, acc_unmap_data, acc_deviceptr
+  public :: acc_hostptr, acc_memcpy_to_device, acc_memcpy_to_device_async
+  public :: acc_memcpy_from_device, acc_memcpy_from_device_async
  ...
-  ! acc_malloc: Only available in C/C++
-  ! acc_free: Only available in C/C++
-
...
interface acc_is_present
  procedure :: acc_is_present_32_h
  procedure :: acc_is_present_64_h
  procedure :: acc_is_present_array_h
end interface

Is that now a different style that we're not listing the new interfaces
in 'module openacc' here?


As there is no precedent for this type of interface, the style is by 
nature differently. But the question is which style is better. The 
current 'openacc' is very short – and contains not a single specific 
interface, but only generic interfaces. The actual specific-procedure 
declarations are only in 'openacc_internal'.


Those new procedures are the first ones that do not have a generic 
interface and only a specific one. Thus, one can either put the specific 
one into 'openacc_internal' and refer it from 'openacc' (via 'use 
openacc_internal' + 'public :: acc_') – or place the 
interface directly into 'openacc' (and not touching 'openacc_internal' 
at all).


During development, I had a accidentally a mixture between both - and 
then settled for the current variant. – Possibly, moving the interface 
to 'openacc' is clearer?


Thoughts?


--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/acc_host_device_ptr.f90
[...]
+! Fortran version of libgomp.oacc-c-c++-common/lib-59.c

I like to also put a cross reference into the originating C/C++ test
case, so that anyone adjusting either one also is aware that another one
may need adjusting, too.

OK - I will do so.

+  ! The following assumes sizeof(void*) being the same on host and device:

That's generally required anyway.


I have to admit that I don't know OpenACC well enough to see whether 
that's the case or not. And, while I am not very consistent, I do try to 
document stricter requirements / implementation-specific parts in a 
testcases.


I know that OpenMP permits that the pointer size differs and 'void *p = 
omp_target_alloc (...);' might in this case not return the device 
pointer but a handle to the device ptr. (For instance, it could be a 
pointer to an uint128_t variable for a 128bit device pointer; I think 
such a hardware exists in real - and uses several bits for other 
purposes like flags.)


In that case, host-side pointer arithmetic won't work and 
'is_device_ptr' clauses etc. need to do transfer work.


But, admittedly, in GCC there it is assumed at many places that both 
sides use the same pointer size* and also during specification 
development, everyone implicitly assumes that routines and clauses yield 
bare device pointers and not some opaque pointer to the actual data (a 
handle); hence, one has to keep remind oneself that the spec permits 
system where that's not the case.


Tobias

(* There are a few spots which handle a smaller device pointer than the 
host pointer or consider a different size but that's not done very 
consistently and largely lacking.)





[Patch] Fortran/Openmp: Use OPT_Wopenmp for gfc_match_omp_depobj warning

2024-02-23 Thread Tobias Burnus
When checking something else, I noticed that there was one warning in 
openmp.cc that did not use OPT_Wopenmp.


I intent to commit the attached patch later today as obvious.

Tobias
Fortran/Openmp: Use OPT_Wopenmp for gfc_match_omp_depobj warning

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_depobj): Use OPT_Wopenmp
	as warning category in gfc_warning.

 gcc/fortran/openmp.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 77f6e1732f9..38de60238c0 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -4768,8 +4768,8 @@ gfc_match_omp_depobj (void)
   if (gfc_match (" ( %v ) ", ) == MATCH_YES)
 	{
 	  if (destroyobj->symtree != depobj->symtree)
-	gfc_warning (0, "The same depend object should be used as DEPOBJ "
-			 "argument at %L and as DESTROY argument at %L",
+	gfc_warning (OPT_Wopenmp, "The same depend object should be used as"
+			 " DEPOBJ argument at %L and as DESTROY argument at %L",
 			 >where, >where);
 	  gfc_free_expr (destroyobj);
 	}


[Patch] OpenMP/nvptx: support 'arch(nvptx64)' as context selector

2024-02-20 Thread Tobias Burnus

I just encountered 'arch(nvptx64)'. I think it makes sense to support
it as alias for 'nvptx' in the context selector for better compatibility.

Comments, remarks, suggestions?

Tobias

PS: See the LLVM documentation below. I do note that those are not identical
as LLVM uses 'nvptx' for 32bit while we effectively only support 64bit
(at least for offloading). Thus, while 'nvptx' might cause problems, adding
'nvptx64' in addition should be harmless.

* * *
From LLVM's 
https://android.googlesource.com/toolchain/llvm/+/refs/heads/master/docs/NVPTXUsage.rst

"The NVPTX target uses the module triple to select between 32/64-bit code
generation and the driver-compiler interface to use. The triple architecture
can be one of ``nvptx`` (32-bit PTX) or ``nvptx64`` (64-bit PTX). The
operating system should be one of ``cuda`` or ``nvcl``, which determines the
interface used by the generated code to communicate with the driver.  Most
users will want to use ``cuda`` as the operating system, which makes the
generated PTX compatible with the CUDA Driver API.

Example: 32-bit PTX for CUDA Driver API: ``nvptx-nvidia-cuda``
Example: 64-bit PTX for CUDA Driver API: ``nvptx64-nvidia-cuda``"

And usage inside LLVM:

clang/lib/Headers/openmp_wrappers/complex:device = {arch(amdgcn, nvptx, 
nvptx64)},   \
OpenMP/nvptx: support 'arch(nvptx64)' as context selector

The main 'arch' context selector for nvptx is, well, 'nvptx';
however, as 'nvptx64' is used as by LLVM, it makes sense
to support it as well.

Note that LLVM has: "The triple architecture can be one of
``nvptx`` (32-bit PTX) or ``nvptx64`` (64-bit PTX)."
GCC effectively only supports the 64bit variant (at least for
offloading). Thus, GCC's 'nvptx' is not quite the same as LLVM's.

gcc/ChangeLog:

	* config/nvptx/gen-omp-device-properties.sh: Add 'nvptx64' to arch.
	* config/nvptx/nvptx.cc (nvptx_omp_device_kind_arch_isa): Likewise.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Context Selectors): Add 'nvptx64' as additional
	'arch' value for nvptx.

 gcc/config/nvptx/gen-omp-device-properties.sh | 2 +-
 gcc/config/nvptx/nvptx.cc | 2 +-
 libgomp/libgomp.texi  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/gen-omp-device-properties.sh b/gcc/config/nvptx/gen-omp-device-properties.sh
index 95c754a164f..3666f9746d1 100644
--- a/gcc/config/nvptx/gen-omp-device-properties.sh
+++ b/gcc/config/nvptx/gen-omp-device-properties.sh
@@ -23,7 +23,7 @@ nvptx_sm_def="$1/nvptx-sm.def"
 sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 
 echo kind: gpu
-echo arch: nvptx
+echo arch: nvptx nvptx64
 
 isa=""
 for sm in $sms; do
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 9363d3ecc6a..3b46b70fc3b 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -6403,7 +6403,7 @@ nvptx_omp_device_kind_arch_isa (enum omp_device_kind_arch_isa trait,
 case omp_device_kind:
   return strcmp (name, "gpu") == 0;
 case omp_device_arch:
-  return strcmp (name, "nvptx") == 0;
+  return strcmp (name, "nvptx") == 0 || strcmp (name, "nvptx64") == 0;
 case omp_device_isa:
 #define NVPTX_SM(XX, SEP)\
   {			\
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index d7da799a922..9de6e15f1c2 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6193,7 +6193,7 @@ on more architectures, GCC currently does not match any @code{arch} or
 @item @code{amdgcn}, @code{gcn}
   @tab See @code{-march=} in ``AMD GCN Options''@footnote{Additionally,
   @code{gfx803} is supported as an alias for @code{fiji}.}
-@item @code{nvptx}
+@item @code{nvptx}, @code{nvptx64}
   @tab See @code{-march=} in ``Nvidia PTX Options''
 @end multitable
 


[patch] OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}

2024-02-19 Thread Tobias Burnus

While waiting for some testing to finish, I got distracted and added the
very low hanging OpenACC 3.3 fruits, i.e. those Fortran routines that directly
map to their C counter part.

Comments, remarks?

Tobias
OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}

These routines map simply to the C counterpart and are meanwhile
defined in OpenACC 3.3. (There are additional routine changes,
including the Fortran addition of acc_attach/acc_detach, that
require more work than a simple addition of an interface and
are therefore excluded.)

libgomp/ChangeLog:

	* libgomp.texi (OpenACC Runtime Library Routines): Document new 3.3
	routines that simply map to their C counterpart.
	* openacc.f90 (openacc_internal, openacc): Add them.
	* openacc_lib.h: Likewise.
	* testsuite/libgomp.fortran/acc_host_device_ptr.f90: New test.
	* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: New test.

 libgomp/libgomp.texi   | 171 -
 libgomp/openacc.f90| 101 ++--
 libgomp/openacc_lib.h  |  94 ++-
 .../libgomp.fortran/acc_host_device_ptr.f90|  43 ++
 .../testsuite/libgomp.oacc-fortran/acc-memcpy.f90  |  47 ++
 5 files changed, 399 insertions(+), 57 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index f57190f203c..d7da799a922 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -2157,8 +2157,6 @@ dimensions.
 Running this routine in a @code{target} region is not supported except on
 the initial device.
 
-
-
 @item @emph{C/C++}
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_target_memcpy_rect_async(void *dst,}
@@ -4684,7 +4682,6 @@ returns @code{false}.
 @item   @tab @code{logical acc_on_device}
 @end multitable
 
-
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
 3.2.17.
@@ -4696,17 +4693,24 @@ returns @code{false}.
 @section @code{acc_malloc} -- Allocate device memory.
 @table @asis
 @item @emph{Description}
-This function allocates @var{len} bytes of device memory. It returns
+This function allocates @var{bytes} of device memory. It returns
 the device address of the allocated memory.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
+@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t bytes);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{type(c_ptr) function acc_malloc(bytes)}
+@item   @tab @code{integer(c_size_t), value :: bytes}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.18.
+3.2.18.  @uref{https://www.openacc.org, openacc specification v3.3}, section
+3.2.16.
 @end table
 
 
@@ -4715,16 +4719,23 @@ the device address of the allocated memory.
 @section @code{acc_free} -- Free device memory.
 @table @asis
 @item @emph{Description}
-Free previously allocated device memory at the device address @code{a}.
+Free previously allocated device memory at the device address @code{data_dev}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
+@item @emph{Prototype}: @tab @code{void acc_free(d_void *data_dev);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_free(data_dev)}
+@item   @tab @code{type(c_ptr), value :: data_dev}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.19.
+3.2.19.  @uref{https://www.openacc.org, openacc specification v3.3}, section
+3.2.17.
 @end table
 
 
@@ -5092,17 +5103,26 @@ array element and @var{len} specifies the length in bytes.
 @table @asis
 @item @emph{Description}
 This function maps previously allocated device and host memory. The device
-memory is specified with the device address @var{d}. The host memory is
-specified with the host address @var{h} and a length of @var{len}.
+memory is specified with the device address @var{data_dev}. The host memory is
+specified with the host address @var{data_arg} and a length of @var{bytes}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
+@item @emph{Prototype}: @tab @code{void acc_map_data(h_void *data_arg, d_void *data_dev, size_t bytes);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_map_data(data_arg, data_dev, bytes)}
+@item   @tab @code{type(*), dimension(*) :: data_arg}
+@item   @tab @code{type(c_ptr), value :: data_dev}
+@item   @tab 

[Patch] libgomp: Device load_image - minor num-funcs/vars check improvement

2024-02-19 Thread Tobias Burnus
When debugging a linker issue, leading to a mismatch in the number of 
host/device functions, I was surprised by seeing one additional entry. 
Well, it turned out to be due to the ICV variable.


This patch makes it more consistent. The "+1" is returned since 
r12-2769-g0bac793ed6bad2 (for the on-device omp_get_device_num), 
extended in r13-2545-g9f2fca56593a2b for a struct to support more ICV 
variables on the devices [to handle OMP_..._DEV environment variables].


As the value is returned unconditionally, it makes sense to use it both 
for the expected-value diagnostic and for the condition further below.


Comments, suggestions, remarks?

Tobias

PS: Alternative would be to make the plugin's value depend on whether 
the data was loaded. But that would make the number-of-entries assert 
weaker and might cause corner-case issues when a slightly older libgomp 
plugin is used with the updated libgomp.so. Thus, I have settled for the 
attached variant.libgomp: Device load_image - improve minor num-funcs/vars check

The run time library loads the offload functions and variable and optionally
the ICV variable and returns the number of loaded items, which has to match
the host side. The plugin returns "+1" (since GCC 12) for the ICV variable
entry, independently whether it was loaded or not, but the var's value
(start == end == 0) can be used to detect when this failed.

Thus, we can tighten the assert check - which this commit does together with
making the output less surprising - and simplify the condition further below.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): If ICV variable
	is is not available, decrement other_count and thus the return value.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_load_image_to_device): Extend fatal-error message;
	simplify a condition.

 libgomp/target.c | 78 +---
 1 file changed, 35 insertions(+), 43 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index 1367e9cce6c..456a9147154 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2355,15 +2355,14 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 num_ind_funcs
   ? (uint64_t *) host_ind_func_table : NULL);
 
-  if (num_target_entries != num_funcs + num_vars
-  /* "+1" due to the additional ICV struct.  */
-  && num_target_entries != num_funcs + num_vars + 1)
+  /* The "+1" is due to the additional ICV struct.  */
+  if (num_target_entries != num_funcs + num_vars + 1)
 {
   gomp_mutex_unlock (>lock);
   if (is_register_lock)
 	gomp_mutex_unlock (_lock);
   gomp_fatal ("Cannot map target functions or variables"
-		  " (expected %u, have %u)", num_funcs + num_vars,
+		  " (expected %u + %u + 1, have %u)", num_funcs, num_vars,
 		  num_target_entries);
 }
 
@@ -2447,48 +2446,41 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   array++;
 }
 
-  /* Last entry is for a ICVs variable.
- Tolerate case where plugin does not return those entries.  */
-  if (num_funcs + num_vars < num_target_entries)
+  /* Last entry is for the ICV struct variable; if absent, start = end = 0.  */
+  struct addr_pair *icv_var = _table[num_funcs + num_vars];
+  if (icv_var->start != 0)
 {
-  struct addr_pair *var = _table[num_funcs + num_vars];
-
-  /* Start address will be non-zero for the ICVs variable if
-	 the variable was found in this image.  */
-  if (var->start != 0)
+  /* The index of the devicep within devices[] is regarded as its
+	 'device number', which is different from the per-device type
+	 devicep->target_id.  */
+  int dev_num = (int) (devicep - [0]);
+  struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num);
+  size_t var_size = icv_var->end - icv_var->start;
+  if (var_size != sizeof (struct gomp_offload_icvs))
 	{
-	  /* The index of the devicep within devices[] is regarded as its
-	 'device number', which is different from the per-device type
-	 devicep->target_id.  */
-	  int dev_num = (int) (devicep - [0]);
-	  struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num);
-	  size_t var_size = var->end - var->start;
-	  if (var_size != sizeof (struct gomp_offload_icvs))
-	{
-	  gomp_mutex_unlock (>lock);
-	  if (is_register_lock)
-		gomp_mutex_unlock (_lock);
-	  gomp_fatal ("offload plugin managed 'icv struct' not of expected "
-			  "format");
-	}
-	  /* Copy the ICVs variable to place on device memory, hereby
-	 actually designating its device number into effect.  */
-	  gomp_copy_host2dev (devicep, NULL, (void *) var->start, icvs,
-			  var_size, false, NULL);
-	  splay_tree_key k = >key;
-	  k->host_start = (uintptr_t) icvs;
-	  k->host_end =
-	k->host_start + (size_mask & sizeof (struct gomp_offload_icvs));
-	  k->tgt = tgt;
-	  k->tgt_offset = var->start;
-	  k->refcount = REFCOUNT_INFINITY;

[Patch] OpenMP/C++: Fix (first)private clause with member variables [PR110347] [was: [RFA/RFC] C++/OpenMP: Supporting (first)private for member variables [PR110347] - or VALUE_EXPR and gimplify]

2024-02-16 Thread Tobias Burnus

Hi,

your suggestion almost did the trick, but caused regressions with
lambda closures in target regions.

Jakub Jelinek wrote:

Ah, and the reason why it doesn't work on target is that it has the
everything is mapped assumption:
   if ((ctx->region_type & ORT_TARGET) != 0)
 {
   if (ctx->region_type & ORT_ACC)
 /* For OpenACC, as remarked above, defer expansion.  */
 shared = false;
   else
 shared = true;
  
   ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);


Perhaps shared = true; should be shared = (flags & GOVD_MAPPED) != 0;
now that we have private/firstprivate clauses on target?


Hence, I now use this code, but also pass a flag to distinguish target
regions (→ map) from shared usage, assuming that it is needed for the
latter (otherwise, there wouldn't be that code).

The issue only showed up for a compile-only testcase, which I have now
turned into a run-time testcase.
In order to do so, I had to fix a bogus test for is mapped (or at least
I think it is bogus) - and for sure it didn't handle shared memory.

I also modified it such that it iterates over devices. Changes to the 
dump: the 'device' clause had to be added (3x) and for the long line: 
'this' and 'iptr' swapped the order and 'map(from:mapped)' became 
'firstprivate(mapped)' due to my changes.
I appended a patch which only shows the test-case differences as "git 
diff" contains all lines as I move it to libgomp/.


Comments, remarks, suggestions?

TobiasOpenMP/C++: Fix (first)private clause with member variables [PR110347]

OpenMP permits '(first)private' for C++ member variables, which GCC handles
by tagging those by DECL_OMP_PRIVATIZED_MEMBER, adding a temporary VAR_DECL
and DECL_VALUE_EXPR pointing to the 'this->member_var' in the C++ front end.

The idea is that in omp-low.cc, the DECL_VALUE_EXPR is used before the
region (for 'firstprivate'; ignored for 'private') while in the region,
the DECL itself is used.

In gimplify, the value expansion is suppressed and deferred if the
  lang_hooks.decls.omp_disregard_value_expr (decl, shared)
returns true - which is never the case if 'shared' is true. In OpenMP 4.5,
only 'map' and 'use_device_ptr' was permitted for the 'target' directive.
And when OpenMP 5.0's 'private'/'firstprivate' clauses was added, the
update that 'shared' is only true for 'map' was missed.

However, just enabling it for all '!shared' will cause issues with
Lambda closures ("__closure->this->...") for which also a DECL_VALUE_EXPR
exists but that is not related to DECL_OMP_PRIVATIZED_MEMBER. Solution:
Update the lang hook to take a Boolean argument, indicating whether it
is called for a target region or not.

2024-02-16  Tobias Burnus  
	Jakub Jelinek  

	PR c++/110347

gcc/cp/ChangeLog:

	* cp-gimplify.cc (cxx_omp_disregard_value_expr): Add new
	Boolean argument and use it.
	* cp-tree.h (cxx_omp_disregard_value_expr): Update prototype.

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_omp_disregard_value_expr): Add
	unused Boolean argument.
	* trans.h (gfc_omp_disregard_value_expr): Update
	prototype.

gcc/ChangeLog:

	* gimplify.cc (omp_notice_variable): Update call to
	lang_hooks.decls.omp_disregard_value_expr.
	(omp_notice_variable): Likewise; fix 'shared' arg for
	(first)private in target regions.
	* hooks.cc (hook_bool_tree_bool_bool_false): New.
	* hooks.h (hook_bool_tree_bool_bool_false): New.
	* langhooks-def.h (LANG_HOOKS_OMP_DISREGARD_VALUE_EXPR):
	Use it.
	* langhooks.h (struct lang_hooks_for_decls): Add second
	Boolean argument.
	* omp-low.cc (omp_member_access_dummy_var): Update
	lang_hooks.decls.omp_disregard_value_expr call.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/target-lambda-3.C: Moved from
	gcc/testsuite/g++.dg/gomp/ and fixed is-mapped handling.
	* testsuite/libgomp.c++/firstprivate-c++-1.C: New test.
	* testsuite/libgomp.c++/firstprivate-c++-2.C: New test.
	* testsuite/libgomp.c++/private-c++-1.C: New test.
	* testsuite/libgomp.c++/private-c++-2.C: New test.
	* testsuite/libgomp.c++/use_device_ptr-c++-1.C: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/target-lambda-1.C: Moved to become a
	run-time test under testsuite/libgomp.c++.

Co-authored-by: Jakub Jelinek 

 gcc/cp/cp-gimplify.cc  |   7 +-
 gcc/cp/cp-tree.h   |   2 +-
 gcc/fortran/trans-openmp.cc|   2 +-
 gcc/fortran/trans.h|   2 +-
 gcc/gimplify.cc|  12 +-
 gcc/hooks.cc   |   6 +
 gcc/hooks.h|   1 +
 gcc/langhooks-def.h|   2 +-
 gcc/langhooks.h|   5 +-
 gcc/omp-low.cc |   2 +-
 gcc/testsuite/g++.dg/gomp/target-lambda-1.C|  94 ---
 libgomp/testsuite/libgomp.c++/firstpriva

[RFA/RFC] C++/OpenMP: Supporting (first)private for member variables [PR110347] - or VALUE_EXPR and gimplify

2024-02-16 Thread Tobias Burnus

The following works with PARALLEL but not with TARGET.

OpenMP states the following is supposed to work:

   A = 5;  // == this->A
   B = 6;  // == this->B
   C[44] = 7; // == this->C; assume 'int C[100]'

   #pragma  firstprivate(A,C) private(B)
   {
 A += 5;  // Now: A is 10.
 B = 7;
 C[44] += 7; // Now C is 14
 // It is unspecified what value this->{A,B,C} has
   }
   // {A,B,C[44]} == this->{A,B,C[44]} are still {5,6,7}

* * *

In the C++ FE, that's handled by creating a temporary variable:
  v = create_temporary_var (TREE_TYPE (m));
with
  SET_DECL_VALUE_EXPR (v, m);
  DECL_OMP_PRIVATIZED_MEMBER(v)
where 'm' is, e.g., 'this->A' - and a bunch of
  'if (DECL_OMP_PRIVATIZED_MEMBER(decl))'
in the g++ FE, only.

For PARALLEL, the VALUE_EXPR survives until omp-low.cc, which handles 
this for (first)privatizing.


But for TARGET, in gimplify.cc, after the following call in 
gimplify_omp_workshare


16813  gimple *g = gimplify_and_return_first (OMP_BODY (expr), );

the 'A' in the body will be turned into 'this->A'.

* * *

Thus, while there is after omplower the expected

  #pragma omp target ... firstprivate(A)

and also

   D.3081 = .omp_data_i->A; A= ...;

what actually gets used is

   D.3084 = .omp_data_i->D.3046;
   this = D.3084;
   D.2996 = this->A;

which unsurprisingly breaks.

* * *

This can be "fixed" by using the following patch.

With that patch, the -fdump-tree-omplower looks fine. But it does then 
fail with:


 during RTL pass: expand
 g2.cpp:11:7: internal compiler error: in make_decl_rtl, at varasm.cc:1443

for the 'A' with 'B = A' (where B is a non-member var) and 'A' is still 
as the value expr 'this->A'.


--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3285,12 +3285,15 @@ gimplify_var_or_parm_decl (tree *expr_p)
   if (gimplify_omp_ctxp && omp_notice_variable (gimplify_omp_ctxp, 
decl, true))

 return GS_ALL_DONE;

+ if (!flag_openmp) // Assume: C++'s DECL_OMP_PRIVATIZED_MEMBER (decl)
+ {
   /* If the decl is an alias for another expression, substitute it. */
   if (DECL_HAS_VALUE_EXPR_P (decl))
 {
   *expr_p = unshare_expr (DECL_VALUE_EXPR (decl));
   return GS_OK;
 }
+ }

   return GS_ALL_DONE;
 }


* * *

Any idea / suggestion how to handle this best?

One way I see would be to add a lang-hook here to check for 
DECL_OMP_PRIVATIZED_MEMBER, similar to the hack above. And

then ensure that the DECL_VALUE_EXPR points to the var decl
in the target region (i.e. some hacking in omp-low.cc).

I have no idea whether that would - nor whether that would be
the way forward. - Thoughts?

Tobias#if TEMPL
template 
#else
#define T int
#endif
#if PRIVATE
#define firstprivate private
#endif
struct t {
  T A;
void f()
{
  T B = 49;
  A = 7;
  #pragma omp parallel firstprivate(A) if(0) shared(B) default(none)
  {
if (A != 7) __builtin_printf("ERROR 1b: %d (!= 7) inside omp parallel\n", A);
A = 5;
B = A;
  }
  if (A != 7) __builtin_printf("ERROR 1: %d (!= 7) omp parallel\n", A);
  if (B != 5) __builtin_printf("ERROR 1a: %d\n", B);
  A = 8; B = 49;
  #pragma omp parallel firstprivate(A)if(0) shared(B) default(none)
  {
if (A != 8) __builtin_printf("ERROR 1b: %d (!= 8) inside omp parallel\n", A);
A = 6;
B = A;
  }
  if (A != 8) __builtin_printf("ERROR 2: %d (!= 8) omp parallel\n", A);
  if (B != 6) __builtin_printf("ERROR 2a: %d\n", B);
  A = 8; B = 49;
  #pragma omp target firstprivate(A) map(from:B) defaultmap(none)
  {
if (A != 7) __builtin_printf("ERROR 2b: %d (!= 7) inside omp target\n", A);
A = 7;
B = A;
  }
  if (A != 8) __builtin_printf("ERROR 3: %d (!= 8) omp target\n", A);
  if (B != 7) __builtin_printf("ERROR 3a: %d\n", B);
  A = 9; B = 49;
  #pragma omp target firstprivate(A) map(from:B) defaultmap(none)
  {
if (A != 7) __builtin_printf("ERROR 3b: %d (!= 7) inside omp target\n", A);
A = 8;
B = A;
  }
  if (A != 9) __builtin_printf("ERROR 4: %d (!= 9) omp target\n", A); else __builtin_printf("OK\n");
  if (B != 8) __builtin_printf("ERROR 4a: %d\n", B);
}
};

void bar() {
#if TEMPL
  struct t x;
#else
  struct t x;
#endif
  x.f();
}

int main()
{
  bar();
}


[RFA/RFC] C++/OpenMP: Supporting (first)private for member variables [PR110347] - or VALUE_EXPR and gimplify

2024-02-16 Thread Tobias Burnus

The following works with PARALLEL but not with TARGET.

OpenMP states the following is supposed to work:

  A = 5;  // == this->A
  B = 6;  // == this->B
  C[44] = 7; // == this->C; assume 'int C[100]'

  #pragma  firstprivate(A,C) private(B)
  {
A += 5;  // Now: A is 10.
B = 7;
C[44] += 7; // Now C is 14
// It is unspecified what value this->{A,B,C} has
  }
  // {A,B,C[44]} == this->{A,B,C[44]} are still {5,6,7}

* * *

In the C++ FE, that's handledby creating a temporary variable:  v = create_temporary_var 
(TREE_TYPE (m)); with  SET_DECL_VALUE_EXPR (v, m);DECL_OMP_PRIVATIZED_MEMBER(v)
where 'm' is, e.g., 'this->A' - and a bunch of 'if 
(DECL_OMP_PRIVATIZED_MEMBER(decl))'
in theg++ FE, only. For PARALLEL, the VALUE_EXPR survives until omp-low.cc, 
which handles this for (first)privatizing. But for TARGET, in 
gimplify.cc, after the following call in gimplify_omp_workshare 16813 
gimple *g = gimplify_and_return_first (OMP_BODY (expr), ); 
will turn the 'A' in the body into 'this->A'.

* * *
Thus, while there is after omplower the expected
#pragma omp target ... firstprivate(A) and also    D.3081 = 
.omp_data_i->A; A= ...; what actually gets used is    D.3084 
= .omp_data_i->D.3046;    this = D.3084; 
   D.2996 = this->A; which unsurprisingly breaks. * * * 
This can be "fixed" by using the following patch. With that patch, the 
-fdump-tree-omplower looks fine. But it does then fail with: during RTL 
pass: expand g2.cpp:11:7: internal compiler error: in make_decl_rtl, at 
varasm.cc:1443
for the 'A' with 'B = A' (where B is a non-member var) and 'A' is still 
as the value expr 'this->A'. --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc 
@@ -3285,12 +3285,15 @@ gimplify_var_or_parm_decl (tree *expr_p) if 
(gimplify_omp_ctxp && omp_notice_variable (gimplify_omp_ctxp, decl, 
true)) return GS_ALL_DONE; + if (!flag_openmp) // Assume: C++'s 
DECL_OMP_PRIVATIZED_MEMBER (decl) + { /* If the decl is an alias for 
another expression, substitute it now. */ if (DECL_HAS_VALUE_EXPR_P 
(decl)) { *expr_p = unshare_expr (DECL_VALUE_EXPR (decl)); return GS_OK; 
} + } return GS_ALL_DONE; } * * * Any idea / suggestion how to handle 
this best? One way I see would be to add a lang-hook here to check for 
DECL_OMP_PRIVATIZED_MEMBER, similar to the hack above. And

then ensure that the DECL_VALUE_EXPR points to the var decl
in the target region (i.e. some hacking in omp-low.cc).

I have no idea whether that would - nor whether that would be
the way forward. - Thoughts?
Tobias
#if TEMPL
template 
#else
#define T int
#endif
#if PRIVATE
#define firstprivate private
#endif
struct t {
  T A;
void f()
{
  T B = 49;
  A = 7;
  #pragma omp parallel firstprivate(A) if(0) shared(B) default(none)
  {
if (A != 7) __builtin_printf("ERROR 1b: %d (!= 7) inside omp parallel\n", A);
A = 5;
B = A;
  }
  if (A != 7) __builtin_printf("ERROR 1: %d (!= 7) omp parallel\n", A);
  if (B != 5) __builtin_printf("ERROR 1a: %d\n", B);
  A = 8; B = 49;
  #pragma omp parallel firstprivate(A)if(0) shared(B) default(none)
  {
if (A != 8) __builtin_printf("ERROR 1b: %d (!= 8) inside omp parallel\n", A);
A = 6;
B = A;
  }
  if (A != 8) __builtin_printf("ERROR 2: %d (!= 8) omp parallel\n", A);
  if (B != 6) __builtin_printf("ERROR 2a: %d\n", B);
  A = 8; B = 49;
  #pragma omp target firstprivate(A) map(from:B) defaultmap(none)
  {
if (A != 7) __builtin_printf("ERROR 2b: %d (!= 7) inside omp target\n", A);
A = 7;
B = A;
  }
  if (A != 8) __builtin_printf("ERROR 3: %d (!= 8) omp target\n", A);
  if (B != 7) __builtin_printf("ERROR 3a: %d\n", B);
  A = 9; B = 49;
  #pragma omp target firstprivate(A) map(from:B) defaultmap(none)
  {
if (A != 7) __builtin_printf("ERROR 3b: %d (!= 7) inside omp target\n", A);
A = 8;
B = A;
  }
  if (A != 9) __builtin_printf("ERROR 4: %d (!= 9) omp target\n", A); else __builtin_printf("OK\n");
  if (B != 8) __builtin_printf("ERROR 4a: %d\n", B);
}
};

void bar() {
#if TEMPL
  struct t x;
#else
  struct t x;
#endif
  x.f();
}

int main()
{
  bar();
}


Re: [Patch] OpenMP: Reject non-const 'condition' trait in Fortran

2024-02-13 Thread Tobias Burnus

Hi Jakub,

Jakub Jelinek wrote:

Of course it makes me wonder to what extent we actually do support the
OpenMP 5.1 target_device device_num trait with constant or non-constant
device num:


Answer: If one removes some early errors such that the compiler
continues a bit further, one gets:

   36 | !$omp  declare variant(variant4) 
match(target_device={device_num(0)})   ! OK
  |   1
sorry, unimplemented: 'target_device' selector set is not supported yet


Hence: Not yet supported, but the Sandra added the basic parsing
support for it to GCC 14 when she improved the handling.

However, the review-pending metadirective patch set
  https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642005.html
has
  [PATCH 3/8] libgomp: runtime support for target_device selector

Thus, once we GCC 15 development work has started, we can look into
this feature and some other patches ...


Other pending patches:https://gcc.gnu.org/wiki/openmpPendingPatches

Tobias


[Patch] OpenMP: Reject non-const 'condition' trait in Fortran (was: [Patch] OpenMP: Handle DECL_ASSEMBLER_NAME with 'declare variant')

2024-02-13 Thread Tobias Burnus

Jakub Jelinek wrote:

Isn't all this caused just by the missing check that condition trait has a
constant expression?

IMHO that is the way to handle it in GCC 14.


Concur – how about the following patch?

Tobias

PS: See PR113904 for follow up tasks. / Instead of '.AND.' etc. I could 
have also used some more '==', '<' etc. expressions in the modified 
examples (as should have Sandra in the initial version), but, 
fortunately, there is at least one '=='.
OpenMP: Reject non-const 'condition' trait in Fortran

OpenMP 5.0 only permits constant expressions for the 'condition' trait
in context selectors; this is relaxed in 5.2 but not implemented. In order
to avoid wrong code, it is now rejected.

Additionally, in Fortran, 'condition' should not accept an integer
expression, which is now ensured. Additionally, as 'device_num' should be
a conforming device number, there is now a check on the value.

	PR middle-end/113904

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_context_selector): Handle splitting of
	OMP_TRAIT_PROPERTY_EXPR into OMP_TRAIT_PROPERTY_{DEV_NUM,BOOL}_EXPR.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_context_selector): Handle splitting of
	OMP_TRAIT_PROPERTY_EXPR into OMP_TRAIT_PROPERTY_{DEV_NUM,BOOL}_EXPR.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_context_selector):
	* trans-openmp.cc (gfc_trans_omp_declare_variant): Handle splitting of
	OMP_TRAIT_PROPERTY_EXPR into OMP_TRAIT_PROPERTY_{DEV_NUM,BOOL}_EXPR.


gcc/ChangeLog:

	* omp-general.cc (struct omp_ts_info): Update for splitting of
	OMP_TRAIT_PROPERTY_EXPR into OMP_TRAIT_PROPERTYDEV_NUM,BOOL}_EXPR.
	* omp-selectors.h (enum omp_tp_type): Replace
	OMP_TRAIT_PROPERTY_EXPR by OMP_TRAIT_PROPERTY_{DEV_NUM,BOOL}_EXPR.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/declare-variant-1.f90: Change 'condition' trait's
	argument from integer to a logical expression.
	* gfortran.dg/gomp/declare-variant-11.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-12.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-13.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-2.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-2a.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-3.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-4.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-6.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-8.f90: Likewise.
	* gfortran.dg/gomp/declare-variant-20.f90: New test.

 gcc/c/c-parser.cc  |  3 +-
 gcc/cp/parser.cc   |  3 +-
 gcc/fortran/openmp.cc  | 30 ++---
 gcc/fortran/trans-openmp.cc|  3 +-
 gcc/omp-general.cc |  4 +-
 gcc/omp-selectors.h|  3 +-
 .../gfortran.dg/gomp/declare-variant-1.f90 |  4 +-
 .../gfortran.dg/gomp/declare-variant-11.f90|  4 +-
 .../gfortran.dg/gomp/declare-variant-12.f90| 12 ++---
 .../gfortran.dg/gomp/declare-variant-13.f90|  2 +-
 .../gfortran.dg/gomp/declare-variant-2.f90 |  8 ++--
 .../gfortran.dg/gomp/declare-variant-20.f90| 51 ++
 .../gfortran.dg/gomp/declare-variant-2a.f90|  4 +-
 .../gfortran.dg/gomp/declare-variant-3.f90 |  8 ++--
 .../gfortran.dg/gomp/declare-variant-4.f90 |  8 ++--
 .../gfortran.dg/gomp/declare-variant-6.f90 | 14 +++---
 .../gfortran.dg/gomp/declare-variant-8.f90 |  2 +-
 17 files changed, 119 insertions(+), 44 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2f..3be91d666a5 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -24656,7 +24656,8 @@ c_parser_omp_context_selector (c_parser *parser, enum omp_tss_code set,
 		}
 	  while (1);
 	  break;
-	case OMP_TRAIT_PROPERTY_EXPR:
+	case OMP_TRAIT_PROPERTY_DEV_NUM_EXPR:
+	case OMP_TRAIT_PROPERTY_BOOL_EXPR:
 	  t = c_parser_expr_no_commas (parser, NULL).value;
 	  if (t != error_mark_node)
 		{
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f0c8f9c4005..68ab74d70b9 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -47984,7 +47984,8 @@ cp_parser_omp_context_selector (cp_parser *parser, enum omp_tss_code set,
 		}
 	  while (1);
 	  break;
-	case OMP_TRAIT_PROPERTY_EXPR:
+	case OMP_TRAIT_PROPERTY_DEV_NUM_EXPR:
+	case OMP_TRAIT_PROPERTY_BOOL_EXPR:
 	  /* FIXME: this is bogus, the expression need
 		 not be constant.  */
 	  t = cp_parser_constant_expression (parser);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0af80d54fad..d8cce6922b0 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -5790,19 +5790,39 @@ gfc_match_omp_context_selector (gfc_omp_set_selector *oss)
 		}
 	  while (1);
 	  break;
-	case OMP_TRAIT_PROPERTY_EXPR:
+	case OMP_TRAIT_PROPERTY_DEV_NUM_EXPR:
+	case OMP_TRAIT_PROPERTY_BOOL_EXPR:
 	  if (gfc_match_expr (>expr) != MATCH_YES)
 		{
 		  

[Patch] OpenMP: Handle DECL_ASSEMBLER_NAME with 'declare variant'

2024-02-13 Thread Tobias Burnus
Inomp_resolve_declare_variant, a code path generates a new decl for the 
base function – in doing so, it ignores the assembler name. As the 
included Fortran example shows, this will lead to a linker error. 
Solution: Also copy over the assembler name. Comments, suggestions, 
remarks before I commit it? Tobias PS: As a fallout of some testing, 
motivated by the original testcase, I have filled a couple of 
declare-variant and context-selector PRs: 113904 (dyn. 
user={condition(...)}), 113905 (multiple users of variant funcs), 113906 
(construct={...} lacks constructs).
OpenMP: Handle DECL_ASSEMBLER_NAME with 'declare variant'

gcc/ChangeLog:

	* omp-general.cc (omp_resolve_declare_variant): When building the decl
	for the base variant, honor also the assembler name.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/declare-variant-20.f90: New test.

 gcc/omp-general.cc |  2 +
 .../gfortran.dg/gomp/declare-variant-20.f90| 62 ++
 2 files changed, 64 insertions(+)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 2e31a3f9290..bc92a170e96 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -2630,6 +2630,8 @@ omp_resolve_declare_variant (tree base)
   (*slot)->variants = entry.variants;
   tree alt = build_decl (DECL_SOURCE_LOCATION (base), FUNCTION_DECL,
 			 DECL_NAME (base), TREE_TYPE (base));
+  if (DECL_ASSEMBLER_NAME_SET_P (base))
+	SET_DECL_ASSEMBLER_NAME (alt, DECL_ASSEMBLER_NAME (base));
   DECL_ARTIFICIAL (alt) = 1;
   DECL_IGNORED_P (alt) = 1;
   TREE_STATIC (alt) = 1;
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-20.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-20.f90
new file mode 100644
index 000..c7050a22365
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-20.f90
@@ -0,0 +1,62 @@
+! { dg-additional-options "-fdump-tree-gimple-asmname" }
+
+! This tests that mangled names, i.e. DECL_NAME != DECL_ASSEMBLER_NAME
+! are properly handled
+
+! This test case failed before with:
+!   undefined reference to `foo'
+! as the actual symbol is __m_MOD_foo
+
+! NOTE 1: This test relies  on late resolution of condition,
+! which is here enforced via the always_false_flag variable.
+!
+! NOTE 2: Using a variable is an OpenMP 5.1 feature that is/was not supported
+! when this test case was created, cf. PR middle-end/113904
+
+module m
+  implicit none (type, external)
+  logical :: always_false_flag = .false.
+contains
+  integer function variant1() result(res)
+res = 1
+  end function
+
+  integer function variant2() result(res)
+res = 2
+  end function
+
+  integer function variant3() result(res)
+res = 3
+  end function
+
+  integer function foo() result(res)
+!$omp  declare variant(variant1) match(construct={teams})
+!$omp  declare variant(variant2) match(construct={parallel})
+!$omp  declare variant(variant3) match(user={condition(always_false_flag)},construct={target})
+res = 99
+  end
+end module m
+
+program main
+  use m
+  implicit none (type, external)
+  integer :: r1, r2, r3
+
+  r1 = foo()
+  if (r1 /= 99) stop 1
+
+  !$omp parallel if (.false.)
+r2 = foo()
+if (r2 /= 2) stop 2
+  !$omp end parallel
+
+  !$omp teams num_teams(1)
+r3 = foo()
+if (r3 /= 1) stop 3
+  !$omp end teams
+
+end program 
+
+! { dg-final { scan-tree-dump-times "r1 = __m_MOD_foo \\(\\);" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "r2 = __m_MOD_variant2 \\(\\);" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "r3 = __m_MOD_variant1 \\(\\);" 1 "gimple" } }


Re: [PATCH v2] openmp, fortran: Add Fortran support for indirect clause on the declare target directive

2024-02-12 Thread Tobias Burnus

Hi Kwok,

Kwok Cheung Yeung wrote:
Oops. I thought exactly the same thing yesterday, but forgot to add 
the changes to my commit! Here is the updated version.


I regard(ed) this change as obvious - hence, I missed to reply.
But for completeness: LGTM.

I think it would be useful to commit this now with an xfail
for the one failing testcase that depends on the review-pending libgomp
patch.

I mean something like:

--- a/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-2.f90
@@ -1,2 +1,3 @@
 ! { dg-do run }
+! { dg-xfail-run-if "Requires libgomp bug fix pending review" { offload_device 
} }

Thanks,

Tobias


On 06/02/2024 9:03 am, Tobias Burnus wrote:
LGTM. I just wonder whether there should be a value test and not just 
a does-not-crash-when-called test for the latter testcase, i.e.




+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
@@ -0,0 +1,25 @@
+! { dg-do run }
+
+! Check that indirect calls work on procedures passed in via a 
dummy argument

+
+module m
+contains
+  subroutine bar
+    !$omp declare target enter(bar) indirect

e.g. "integer function bar()" ... " bar = 42"

+  end subroutine
+
+  subroutine foo(f)
+    procedure(bar) :: f
+
+    !$omp target
+  call f

And then: if (f() /= 42) stop 1

+    !$omp end target
+  end subroutine
+end module


Thanks,

Tobias



Re: [PATCH v2] openmp, fortran: Add Fortran support for indirect clause on the declare target directive

2024-02-06 Thread Tobias Burnus

Kwok Cheung Yeung wrote:
As previously discussed, this version of the patch adds code to emit a 
warning when a directive like this:


!$omp declare target indirect(.true.)

is encountered (i.e. a target directive containing at least one 
clause, but no to/enter clause, which appears to violate the OpenMP 
standard). A test is also added to 
gfortran.dg/gomp/declare-target-indirect-1.f90 to test for this.


Thanks. And indeed, the 5.1 spec requires under "Restrictions to the 
declare target directive are as follows:" "If the directive has a 
clause, it must contain at least one 'to' clause or at least one 'link' 
clause.". [5.2 replaced 'to' by its alias 'enter' and the 6.0 preview 
added 'local' to the list.]



I have also added a declare-target-indirect-3.f90 test to libgomp to 
check that procedures passed via a dummy argument work properly when 
used in an indirect call.


Okay for mainline?


LGTM. I just wonder whether there should be a value test and not just a 
does-not-crash-when-called test for the latter testcase, i.e.




+++ b/libgomp/testsuite/libgomp.fortran/declare-target-indirect-3.f90
@@ -0,0 +1,25 @@
+! { dg-do run }
+
+! Check that indirect calls work on procedures passed in via a dummy argument
+
+module m
+contains
+  subroutine bar
+!$omp declare target enter(bar) indirect

e.g. "integer function bar()" ... " bar = 42"

+  end subroutine
+
+  subroutine foo(f)
+procedure(bar) :: f
+
+!$omp target
+  call f

And then: if (f() /= 42) stop 1

+!$omp end target
+  end subroutine
+end module


Thanks,

Tobias


Re: [v2][patch] plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

2024-01-29 Thread Tobias Burnus

Hi Thomas,

Thomas Schwinge wrote:

On 2024-01-23T10:55:16+0100, Tobias Burnus  wrote:

plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

The following issue was found when running libgomp.c/target-52.c with
nvptx offloading when the dg-set-target-env-var was honored.

Curious, I've never seen this failure mode in my several different
configurations.  :-|


I think we recently fixed a surprisingly high number of issues that we 
didn't see before but were clearly preexisting for quite a while. 
(Mostly for AMDGPU but still.)


But I concur that this one is a more tricky one.


For some unknown reasons, while this does not have an effect on the
order of the called plugin functions for initialization, it changes the
order of function calls for shutting down. Namely, when the two environment
variables are set, GOMP_offload_unregister_ver is called now before
gomp_target_fini.

Re "unknown reasons", isn't that indeed explained by the different
'atexit' function/'__attribute__((destructor))' sequencing, due to
different order of 'atexit'/'__attribute__((constructor))' calls?


Maybe or not. First, it does not seem to occur elsewhere but maybe 
that's because remote setting of environment variables does not work 
with DejaGNU and most code was run such a way. And secondly, I have no 
idea how 'atexit' and destructors are implemented internally.



And it seems as if CUDA regards a call to cuModuleUnload
(or unloading the last module?) as indication that the device context should
be destroyed - or, at least, afterwards calling cuCtxGetDevice will return
CUDA_ERROR_DEINITIALIZED.

However, this I don't understand -- but would like to.  Are you saying
that for:

 --- libgomp/plugin/plugin-nvptx.c
 +++ libgomp/plugin/plugin-nvptx.c
 @@ -1556,8 +1556,16 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned 
version, const void *target_data)
  if (image->target_data == target_data)
{
*prev_p = image->next;
 -  if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
 +  CUresult r;
 +  r = CUDA_CALL_NOCHECK (cuModuleUnload, image->module);
 +  GOMP_PLUGIN_debug (0, "%s: cuModuleUnload: %s\n", __FUNCTION__, 
cuda_error (r));
 +  if (r != CUDA_SUCCESS)
  ret = false;
 +  CUdevice dev_;
 +  r = CUDA_CALL_NOCHECK (cuCtxGetDevice, _);
 +  GOMP_PLUGIN_debug (0, "%s: cuCtxGetDevice: %s\n", __FUNCTION__, 
cuda_error (r));
 +  GOMP_PLUGIN_debug (0, "%s: dev_=%d, dev->dev=%d\n", __FUNCTION__, dev_, 
dev->dev);
 +  assert (dev_ == dev->dev);
free (image->fns);
free (image);
break;

..., you're seeing an error for 'libgomp.c/target-52.c' with
'env OMP_TARGET_OFFLOAD=mandatory OMP_DISPLAY_ENV=true'?  I get:

 GOMP_OFFLOAD_unload_image: cuModuleUnload: no error
 GOMP_OFFLOAD_unload_image: cuCtxGetDevice: no error
 GOMP_OFFLOAD_unload_image: dev_=0, dev->dev=0

Or, is something else happening in between the 'cuModuleUnload' and your
reportedly failing 'cuCtxGetDevice'?


I cluttered the plugin with "printf" debugging; hence, no other code
is calling *into* the run-time library as far as I can see.

But now I will try it with a vanilla code and your patch applied.

Result for the target-52.c with the env vars set:

DEBUG: GOMP_offload_unregister_ver dev=0; state=1
DEBUG: gomp_unload_image_from_device
DEBUG GOMP_OFFLOAD_unload_image, 0, 196609
GOMP_OFFLOAD_unload_image: cuModuleUnload: no error
GOMP_OFFLOAD_unload_image: cuCtxGetDevice: no error
GOMP_OFFLOAD_unload_image: dev_=0, dev->dev=0
DEBUG: gomp_target_fini; dev=0, state=1
DEBUG  0
DEBUG: nvptx_attach_host_thread_to_device - 0
DEBUG: ERROR nvptx_attach_host_thread_to_device - 0

libgomp: cuCtxGetDevice error: unknown cuda error

Hence: The immediately calling cuCtxGetDevice after
the device unloading does not fail.

But calling it soon late via gomp_target_fini
→ GOMP_OFFLOAD_fini_device → nvptx_attach_host_thread_to_device
does fail.

I have attached my printf patch for reference.

* * *


Re your PR113513 details, I don't see how your failure mode could be
related to (a) the PTX code ('--with-arch=sm_80'), or the GPU hardware
("NVIDIA RTX A1000 6GB") (..., unless the Nvidia Driver is doing "funny"
things, of course...), so could this possibly be due to a recent change
in the CUDA Driver/Nvidia Driver?  You say "CUDA Version: 12.3", but
which which Nvidia Driver version?  The latest I've now tested are:

 Driver Version: 525.147.05   CUDA Version: 12.0
 Driver Version: 535.154.05   CUDA Version: 12.2


My laptop has:

NVIDIA-SMI 545.29.06  Driver Version: 545.29.06    CUDA Version: 
12.3


I'd like to please defer that one until we understand the actual origin
of the misbehavior.
(I think that patch makes still sense, but first finding out what goes 
wrong is fine nonetheless

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Tobias Burnus

Andrew Stubbs wrote:

/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
   .amdhsa_next_free_vgpr    516 
^~~ [Obviously, likewise 
forlibgomp.c++/..
Hmm, supposedly there are 768 registers allocated in groups of 12, on 
gfx1100 (8 on other devices), which number you have to double on 
wavefrontsize64 because that field actually counts the number of 
32-lane registers. The ISA can only actually reference 256 registers, 
so the limit here should be 512. (The remaining registers are intended 
for other wavefronts to use.)


But 256 is not divisible by 12, and it looks like we've rounded up. I 
guess we need to set the limit at 252 (504), for gfx1100.


BTW: The LLVM source code has,
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp#L1066

unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) {
  if (STI->getFeatureBits().test(FeatureGFX90AInsts))
return 512;
  if (!isGFX10Plus(*STI))
return 256;
  bool IsWave32 = STI->getFeatureBits().test(FeatureWavefrontSize32);
  if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
return IsWave32 ? 1536 : 768;
  return IsWave32 ? 1024 : 512;
}


Tobias



[patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Tobias Burnus

Andrew wrote off list:
  "Vector reductions don't work on RDNA, as is, but they're
   supposed to be disabled by the insn condition"

This patch disables "fold_left_plus_", which is about
vectorization and in the code path shown in the backtrace.
I can also confirm manually that it fixes the ICE I saw and
also the ICE for the testfile that Richard's PR shows at the
end of his backtrace.  (-O3 is needed to trigger the ICE.)

OK for mainline?

Tobias

* * *

PS: We could add testcase(s) that is/are explicitly compiled with
gfx1100 and/or gfx1030 + '-O3' to ensure that this gets tested
with AMDGPU enabled, but I am not sure whether it is really worthwhile.


PPS: Running the testsuite, I see the following fails with
gfx1100 offloading:

FAIL: libgomp.c/../libgomp.c-c++-common/for-5.c (test for excess errors)
Excess errors:
/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
  .amdhsa_next_free_vgpr516 
   ^~~ [Obviously, likewise forlibgomp.c++/../libgomp.c-c++-common/for-5.c]
FAIL:libgomp.c/pr104783-2.c execution test FAIL:libgomp.c/pr104783.c 
execution test (The .log unfortunately does not show more details) 
FAIL:libgomp.fortran/optional-map.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for 
excess errors) FAIL:libgomp.fortran/optional-map.f90   -O3 -g  (test for 
excess errors) FAIL: libgomp.fortran/target1.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer 
-finline-functions  (test for excess errors) FAIL: 
libgomp.fortran/target1.f90   -O3 -g  (test for excess errors)Same 'out 
of range' as above. * * * Manual testing shows for the two execution 
fails: Memory access fault by GPU node-1 (Agent handle: 0x8d1aa0) on 
address (nil). Reason: Page not present or supervisor privilege. 
Interestingly, it only fails with -O1 or higher, for -O0 it works. Tobias
gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

gcc/ChangeLog:

	PR target/113615
	* config/gcn/gcn-valu.md (fold_left_plus_): Only
	define for !TARGET_RDNA2_PLUS.

Signed-off-by: Tobias Burnus 

 gcc/config/gcn/gcn-valu.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index cd027f8b369..23b441f8e8b 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -4274,7 +4274,8 @@ (define_expand "fold_left_plus_"
  [(match_operand: 0 "register_operand")
   (match_operand: 1 "gcn_alu_operand")
   (match_operand:V_FP 2 "gcn_alu_operand")]
-  "can_create_pseudo_p ()
+  "!TARGET_RDNA2_PLUS
+   && can_create_pseudo_p ()
&& (flag_openacc || flag_openmp
|| flag_associative_math)"
   {


[committed] libgomp.c/declare-variant-4.h: Fix used variant function for gfx1030/gfx1100

2024-01-29 Thread Tobias Burnus

This fixes an obvious and stupid copy'n'paste bug of mine in
the OpenMP declare variant used for two testcases, fixing:
FAIL: libgomp.c/declare-variant-4-gfx1030.c 
scan-amdgcn-amdhsa-offload-tree-dump optimized "= gfx1030 \\(\\);" FAIL: 
libgomp.c/declare-variant-4-gfx1100.c 
scan-amdgcn-amdhsa-offload-tree-dump optimized "= gfx1100 \\(\\);" 
Committed as obvious as r14-8488-gcb366731e767e2

Tobias
commit cb366731e767e2dec158c8c4a495fe2ccbd550ff
Author: Tobias Burnus 
Date:   Mon Jan 29 11:06:15 2024 +0100

libgomp.c/declare-variant-4.h: Fix used variant function for gfx1030/gfx1100

libgomp/ChangeLog:

* testsuite/libgomp.c/declare-variant-4.h: Use gfx1100/gfx1030
function not gfx90a for gfx1100/gfx1030 context selector.

Signed-off-by: Tobias Burnus 

diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4.h b/libgomp/testsuite/libgomp.c/declare-variant-4.h
index 393a5e295cc..d2e9194bf5b 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-4.h
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4.h
@@ -58,8 +58,8 @@ gfx1100 (void)
 #pragma omp declare variant(gfx906) match(device = {isa("gfx906")})
 #pragma omp declare variant(gfx908) match(device = {isa("gfx908")})
 #pragma omp declare variant(gfx90a) match(device = {isa("gfx90a")})
-#pragma omp declare variant(gfx90a) match(device = {isa("gfx1030")})
-#pragma omp declare variant(gfx90a) match(device = {isa("gfx1100")})
+#pragma omp declare variant(gfx1030) match(device = {isa("gfx1030")})
+#pragma omp declare variant(gfx1100) match(device = {isa("gfx1100")})
 __attribute__ ((noipa))
 int
 f (void)


[wwwdocs][patch] gcc-14/changes.html (amdgcn): Update for gfx1030/gfx1100

2024-01-26 Thread Tobias Burnus

Mention that gfx1030/gfx1100 are now supported.

As noted in another thread, LLVM 15's assembler is now required, before 
LLVM 13.0.1 would do. (Alternatively, disabling gfx1100 support would 
do.) Hence, the added link to the install documentation.


Comments, suggestions?

Tobias
gcc-14/changes.html (amdgcn): Update for gfx1030/gfx1100

Signed-off-by: Tobias Burnus 

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index a04b62ff..2d777f52 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -329,6 +329,11 @@ a work-in-progress.
 AMD Radeon (GCN)
 
 
+  Initial support for the AMD Radeon gfx1030 (RDNA2) and
+gfx1100 (RDNA3) devices has been added, which required an
+update of the default
+https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa;>build
+requirements for the build.
   Improved register usage and performance on CDNA Instinct MI100
 and MI200 series devices.
   The default device architecture is now gfx900 (Vega).


[patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs)

2024-01-26 Thread Tobias Burnus

Hi,

Thomas Schwinge wrote:
amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to 
the docs

...
Further down in that file, we state:
 @anchor{amdgcn-x-amdhsa}
 @heading amdgcn-*-amdhsa
 AMD GCN GPU target.
 
 Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...]


LLVM 13.0.1 may still be fine for gfx1030
('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
tested), but it's not sufficient for gfx1100 anymore:


Testing with the system compilers here, llvm-mc-14.0.6 also fails while 
llvm-mc-15.0.7 accepts it.



Which version of LLVM should we be recommending?


>= LLVM 15, I think. How about the following wording? It still mentions 
LLVM 13.0.1 for those that really need it but with for the default 
setup, it requires 15+.


Tobias
install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled

gcc/ChangeLog:

	* doc/install.texi (amdgcn): Recommend LLVM 15+ and newlib 4.4+,
	but keep requiring only newlib 4.3+ and, if gfx1100 is disabled,
	LLVM 13.0.1+.

Signed-off-by: Tobias Burnus 

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 5747b5a12fe..c7794439107 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3927,14 +3927,15 @@ This is a synonym for @samp{x86_64-*-solaris2*}.
 @heading amdgcn-*-amdhsa
 AMD GCN GPU target.
 
-Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, and copy
+Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy
 @file{bin/llvm-mc} to @file{amdgcn-amdhsa/bin/as},
 @file{bin/lld} to @file{amdgcn-amdhsa/bin/ld},
 @file{bin/llvm-nm} to @file{amdgcn-amdhsa/bin/nm}, and
 @file{bin/llvm-ar} to both @file{bin/amdgcn-amdhsa-ar} and
-@file{bin/amdgcn-amdhsa-ranlib}.
+@file{bin/amdgcn-amdhsa-ranlib}.  Note that LLVM 13.0.1 or LLVM 14 can be used
+by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}.
 
-Use Newlib (4.3.0 or newer).
+Use Newlib (4.3.0 or newer; 4.4.0 or later is recommended).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use


Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

2024-01-26 Thread Tobias Burnus

Hi Richard,

Richard Biener wrote:

Looks good to me.
Thanks - I will commit it after lunch to see whether someone else has 
additional comments.

+@item gfx1030
+Compile for RDNA2 gfx1030 devices (GFX10 series).
+
+@item gfx1100
+Compile for RDNA3 gfx1100 devices (GFX11 series).

Btw, "GFX10" series isn't precise as it's only the high-end parts
that are covered by gfx1030, there's gfx103[0-6] where hopefully
at least gfx1031, gfx1032 and gfx1034 (the dGPU variants) are
trivial to support as well(?).

Using gfx103x might be better
that way, OTOH if APU vs dGPU will make a compilation target
difference then gfx103d vs gfx103a maybe?  "GFX10" series might


On the LLVM side (and also for the llvm-mc assembler), they distinguish 
all of the gfx* for the -mcpu= argument. See 
https://llvm.org/docs/AMDGPUUsage.html#id26 for that list.


Thus, I think it makes sense to do the same here.  The last column on 
that page lists the supported hardware but is it neither really up to 
date nor complete.


Thus, I found it easier to just mention gfx1100 as that's unique. On the 
ROCm side, AMD has:


https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

Tobias


  1   2   3   4   5   6   7   8   9   10   >