On Sat, Dec 10, 2022 at 09:18:26AM +0100, Tobias Burnus wrote: > libgomp.texi: Reverse-offload updates > > libgomp/ > * libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'. > (GCN): Add item about 'omp requires'. > (nvptx): Likewise; add item about reverse offload. > > --- a/libgomp/libgomp.texi > +++ b/libgomp/libgomp.texi > @@ -192,8 +192,8 @@ The OpenMP 4.5 specification is fully supported. > env variable @tab Y @tab > @item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab > @item @code{requires} directive @tab P > - @tab complete but no non-host devices provides @code{unified_address}, > - @code{unified_shared_memory} or @code{reverse_offload} > + @tab complete but no non-host devices provides @code{unified_address} > or > + @code{unified_shared_memory} > @item @code{teams} construct outside an enclosing target region @tab Y @tab > @item Non-rectangular loop nests @tab Y @tab > @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab > @@ -228,7 +228,7 @@ The OpenMP 4.5 specification is fully supported. > @item @code{allocate} clause @tab P @tab Initial support > @item @code{use_device_addr} clause on @code{target data} @tab Y @tab > @item @code{ancestor} modifier on @code{device} clause > - @tab Y @tab See comment for @code{requires} > + @tab Y @tab Host fallback with GCN devices > @item Implicit declare target directive @tab Y @tab > @item Discontiguous array section with @code{target update} construct > @tab N @tab > @@ -288,7 +288,7 @@ The OpenMP 4.5 specification is fully supported. > @code{append_args} @tab N @tab > @item @code{dispatch} construct @tab N @tab > @item device-specific ICV settings with environment variables @tab Y @tab > -@item @code{assume} directive @tab Y @tab > +@item @code{assume} and @code{assumes} directives @tab Y @tab > @item @code{nothing} directive @tab Y @tab > @item @code{error} directive @tab Y @tab > @item @code{masked} construct @tab Y @tab > @@ -4456,6 +4456,9 @@ The implementation remark: > @item I/O within OpenMP target regions and OpenACC parallel/kernels is > supported > using the C library @code{printf} functions and the Fortran > @code{print}/@code{write} statements. > +@item OpenMP code that has a requires directive with @code{unified_address}, > + @code{unified_shared_memory} or @code{reverse_offload} will remove > + any GCN device from the list of available devices (``host fallback''). > @end itemize > > > @@ -4507,6 +4510,15 @@ The implementation remark: > @item Compilation OpenMP code that contains @code{requires reverse_offload} > requires at least @code{-march=sm_35}, compiling for > @code{-march=sm_30} > is not supported. > +@item For code containing reverse offload (i.e. @code{target} regions with > + @code{device(ancestor:1)}), there is a slight performance penality > + for @emph{all} target regions, consisting mostly of shutdown delay > + Per device, reverse offload regions are processed serial such that
s/serial/serially/ ? > + the next reverse offload region is only executed after the previous > + one returns. > +@item OpenMP code that has a requires directive with @code{unified_address} > + or @code{unified_shared_memory} will remove any nvptx device from the > + list of available devices (``host fallback''). > @end itemize Otherwise LGTM Jakub