Hi!

On 01/11/2016 11:35 AM, Jakub Jelinek wrote:
On Tue, Jan 05, 2016 at 09:47:59AM -0600, James Norris wrote:
I've updated the original patch after some very helpful
comments from Sandra (thank you, thank you).

I'd prefer if OpenMP
* Enabling OpenMP::            How to enable OpenMP for your applications.
* Runtime Library Routines::   The OpenMP runtime application programming
                                interface.
* Environment Variables::      Influencing runtime behavior with environment
                                variables.
chapters precede the OpenACC chapters, most libgomp users are not really
using any offloading, which is new, but using OpenMP for host
parallelization, and only far fewer users are actually trying some
acceleration, whether OpenACC or OpenMP offloading parts.

OpenACC content has been moved after the OpenMP content.


As Bernd found, there are some UTF-8 quotes or what in the patch, those
need to be replaced by some texinfo markup, say

+sections 4.1 and 4.2 of the ???The OpenACC
+Application Programming Interface???, Version 2.0, June, 2013.}.

@uref{http://www.openacc.org/, OpenACC Application Programming Interface, 
Version 2.0, June, 2013}
or something similar.

Those were double quotes and have been changed to @uref's.

Patch commited to trunk

Thanks for taking time for the review.

Jim


Index: ChangeLog
===================================================================
--- ChangeLog	(revision 232278)
+++ ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2016-01-12  James Norris  <jnor...@codesourcery.com>
+
+	* libgomp.texi: Updates for OpenACC.
+
 2016-01-11  Alexander Monakov  <amona...@ispras.ru>
 
 	* plugin/plugin-nvptx.c (link_ptx): Do not set CU_JIT_TARGET.
Index: libgomp.texi
===================================================================
--- libgomp.texi	(revision 232278)
+++ libgomp.texi	(working copy)
@@ -99,6 +99,16 @@
                                interface.
 * Environment Variables::      Influencing runtime behavior with environment 
                                variables.
+* Enabling OpenACC::           How to enable OpenACC for your
+                               applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+                               programming interface.
+* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
+                               environment variables.
+* CUDA Streams Usage::         Notes on the implementation of
+                               asynchronous operations.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+                               NVIDIA CUBLAS library.
 * The libgomp ABI::            Notes on the external ABI presented by libgomp.
 * Reporting Bugs::             How to report bugs in the GNU Offloading and
                                Multi Processing Runtime Library.
@@ -1790,6 +1800,1272 @@
 
 
 @c ---------------------------------------------------------------------
+@c Enabling OpenACC
+@c ---------------------------------------------------------------------
+
+@node Enabling OpenACC
+@chapter Enabling OpenACC
+
+To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
+flag @option{-fopenacc} must be specified.  This enables the OpenACC directive
+@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
+@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
+@code{!$} conditional compilation sentinels in free form and @code{c$},
+@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
+arranges for automatic linking of the OpenACC runtime library 
+(@ref{OpenACC Runtime Library Routines}).
+
+A complete description of all OpenACC directives accepted may be found in 
+the @uref{http://www.openacc.org/, OpenACC} Application Programming
+Interface manual, version 2.0.
+
+Note that this is an experimental feature and subject to
+change in future versions of GCC.  See
+@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Runtime Library Routines
+@c ---------------------------------------------------------------------
+
+@node OpenACC Runtime Library Routines
+@chapter OpenACC Runtime Library Routines
+
+The runtime routines described here are defined by section 3 of the OpenACC
+specifications in version 2.0.
+They have C linkage, and do not throw exceptions.
+Generally, they are available only for the host, with the exception of
+@code{acc_on_device}, which is available for both the host and the
+acceleration device.
+
+@menu
+* acc_get_num_devices::         Get number of devices for the given device
+                                type.
+* acc_set_device_type::         Set type of device accelerator to use.
+* acc_get_device_type::         Get type of device accelerator to be used.
+* acc_set_device_num::          Set device number to use.
+* acc_get_device_num::          Get device number to be used.
+* acc_async_test::              Tests for completion of a specific asynchronous
+                                operation.
+* acc_async_test_all::          Tests for completion of all asychronous
+                                operations.
+* acc_wait::                    Wait for completion of a specific asynchronous
+                                operation.
+* acc_wait_all::                Waits for completion of all asyncrhonous
+                                operations.
+* acc_wait_all_async::          Wait for completion of all asynchronous
+                                operations.
+* acc_wait_async::              Wait for completion of asynchronous operations.
+* acc_init::                    Initialize runtime for a specific device type.
+* acc_shutdown::                Shuts down the runtime for a specific device
+                                type.
+* acc_on_device::               Whether executing on a particular device
+* acc_malloc::                  Allocate device memory.
+* acc_free::                    Free device memory.
+* acc_copyin::                  Allocate device memory and copy host memory to
+                                it.
+* acc_present_or_copyin::       If the data is not present on the device,
+                                allocate device memory and copy from host
+                                memory.
+* acc_create::                  Allocate device memory and map it to host
+                                memory.
+* acc_present_or_create::       If the data is not present on the device,
+                                allocate device memory and map it to host
+                                memory.
+* acc_copyout::                 Copy device memory to host memory.
+* acc_delete::                  Free device memory.
+* acc_update_device::           Update device memory from mapped host memory.
+* acc_update_self::             Update host memory from mapped device memory.
+* acc_map_data::                Map previously allocated device memory to host
+                                memory.
+* acc_unmap_data::              Unmap device memory from host memory.
+* acc_deviceptr::               Get device pointer associated with specific
+                                host address.
+* acc_hostptr::                 Get host pointer associated with specific
+                                device address.
+* acc_is_present::              Indiciate whether host variable / array is
+                                present on device.
+* acc_memcpy_to_device::        Copy host memory to device memory.
+* acc_memcpy_from_device::      Copy device memory to host memory.
+
+API routines for target platforms.
+
+* acc_get_current_cuda_device:: Get CUDA device handle.
+* acc_get_current_cuda_context::Get CUDA context handle.
+* acc_get_cuda_stream::         Get CUDA stream handle.
+* acc_set_cuda_stream::         Set CUDA stream handle.
+@end menu
+
+
+
+@node acc_get_num_devices
+@section @code{acc_get_num_devices} -- Get number of devices for given device type
+@table @asis
+@item @emph{Description}
+This function returns a value indicating the number of devices available
+for the device type specified in @var{devicetype}. 
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
+@item                  @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.1.
+@end table
+
+
+
+@node acc_set_device_type
+@section @code{acc_set_device_type} -- Set type of device accelerator to use.
+@table @asis
+@item @emph{Description}
+This function indicates to the runtime library which device typr, specified
+in @var{devicetype}, to use when executing a parallel or kernels region. 
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
+@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.2.
+@end table
+
+
+
+@node acc_get_device_type
+@section @code{acc_get_device_type} -- Get type of device accelerator to be used.
+@table @asis
+@item @emph{Description}
+This function returns what device type will be used when executing a
+parallel or kernels region.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
+@item                  @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.3.
+@end table
+
+
+
+@node acc_set_device_num
+@section @code{acc_set_device_num} -- Set device number to use.
+@table @asis
+@item @emph{Description}
+This function will indicate to the runtime which device number,
+specified by @var{num}, associated with the specifed device
+type @var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_device_num(int num, acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
+@item                   @tab @code{integer devicenum}
+@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.4.
+@end table
+
+
+
+@node acc_get_device_num
+@section @code{acc_get_device_num} -- Get device number to be used.
+@table @asis
+@item @emph{Description}
+This function returns which device number associated with the specified device
+type @var{devicetype}, will be used when executing a parallel or kernels
+region.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
+@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
+@item                   @tab @code{integer acc_get_device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.5.
+@end table
+
+
+
+@node acc_async_test
+@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
+@table @asis
+@item @emph{Description}
+This function tests for completion of the asynchrounous operation specified
+in @var{arg}. In C/C++, a non-zero value will be returned to indicate
+the specified asynchronous operation has completed. While Fortran will return
+a @code{true}. If the asynchrounous operation has not completed, C/C++ returns
+a zero and Fortran returns a @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_async_test(arg)}
+@item                   @tab @code{integer(kind=acc_handle_kind) arg}
+@item                   @tab @code{logical acc_async_test}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.6.
+@end table
+
+
+
+@node acc_async_test_all
+@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function tests for completion of all asynchrounous operations.
+In C/C++, a non-zero value will be returned to indicate all asynchronous
+operations have completed. While Fortran will return a @code{true}. If
+any asynchronous operation has not completed, C/C++ returns a zero and
+Fortran returns a @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_async_test()}
+@item                   @tab @code{logical acc_get_device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.7.
+@end table
+
+
+
+@node acc_wait
+@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
+@table @asis
+@item @emph{Description}
+This function waits for completion of the asynchronous operation
+specified in @var{arg}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait(arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
+@item                   @tab @code{integer(acc_handle_kind) arg}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.8.
+@end table
+
+
+
+@node acc_wait_all
+@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function waits for the completion of all asynchronous operations.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_all(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_async()}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.10.
+@end table
+
+
+
+@node acc_wait_all_async
+@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function enqueues a wait operation on the queue @var{async} for any
+and all asynchronous operations that have been previously enqueued on
+any queue.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
+@item                   @tab @code{integer(acc_handle_kind) async}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.11.
+@end table
+
+
+
+@node acc_wait_async
+@section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function enqueues a wait operation on queue @var{async} for any and all
+asynchronous operations enqueued on queue @var{arg}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
+@item                   @tab @code{integer(acc_handle_kind) arg, async}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.9.
+@end table
+
+
+
+@node acc_init
+@section @code{acc_init} -- Initialize runtime for a specific device type.
+@table @asis
+@item @emph{Description}
+This function initializes the runtime for the device type specified in
+@var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
+@item                   @tab @code{integer(acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.12.
+@end table
+
+
+
+@node acc_shutdown
+@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
+@table @asis
+@item @emph{Description}
+This function shuts down the runtime for the device type specified in
+@var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
+@item                   @tab @code{integer(acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.13.
+@end table
+
+
+
+@node acc_on_device
+@section @code{acc_on_device} -- Whether executing on a particular device
+@table @asis
+@item @emph{Description}:
+This function returns whether the program is executing on a particular
+device specified in @var{devicetype}. In C/C++ a non-zero value is
+returned to indicate the device is execiting on the specified device type.
+In Fortran, @code{true} will be returned. If the program is not executing
+on the specified device type C/C++ will return a zero, while Fortran will
+return @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
+@item                   @tab @code{integer(acc_device_kind) devicetype}
+@item                   @tab @code{logical acc_on_device}
+@end multitable
+
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.14.
+@end table
+
+
+
+@node acc_malloc
+@section @code{acc_malloc} -- Allocate device memory.
+@table @asis
+@item @emph{Description}
+This function allocates @var{len} bytes of device memory. It returns
+the device address of the allocated memory.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.15.
+@end table
+
+
+
+@node acc_free
+@section @code{acc_free} -- Free device memory.
+@table @asis
+@item @emph{Description}
+Free previously allocated device memory at the device address @code{a}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.16.
+@end table
+
+
+
+@node acc_copyin
+@section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
+@table @asis
+@item @emph{Description}
+In C/C++, this function allocates @var{len} bytes of device memory
+and maps it to the specified host address in @var{a}. The device
+address of the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a
+variable or array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.17.
+@end table
+
+
+
+@node acc_present_or_copyin
+@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
+@table @asis
+@item @emph{Description}
+This function tests if the host data specifed by @var{a} and of length
+@var{len} is present or not. If it is not present, then device memory
+will be allocated and the host memory copied. The device address of
+the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
+@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.18.
+@end table
+
+
+
+@node acc_create
+@section @code{acc_create} -- Allocate device memory and map it to host memory.
+@table @asis
+@item @emph{Description}
+This function allocates device memory and maps it to host memory specified
+by the host address @var{a} with a length of @var{len} bytes. In C/C++,
+the function returns the device address of the allocated device memory.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_create(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.19.
+@end table
+
+
+
+@node acc_present_or_create
+@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
+@table @asis
+@item @emph{Description}
+This function tests if the host data specifed by @var{a} and of length
+@var{len} is present or not. If it is not present, then device memory
+will be allocated and mapped to host memory. In C/C++, the device address
+of the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
+@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.20.
+@end table
+
+
+
+@node acc_copyout
+@section @code{acc_copyout} -- Copy device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function copies mapped device memory to host memory which is specified
+by host address @var{a} for a length @var{len} bytes in C/C++.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.21.
+@end table
+
+
+
+@node acc_delete
+@section @code{acc_delete} -- Free device memory.
+@table @asis
+@item @emph{Description}
+This function frees previously allocated device memory specified by
+the device address @var{a} and the length of @var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.22.
+@end table
+
+
+
+@node acc_update_device
+@section @code{acc_update_device} -- Update device memory from mapped host memory.
+@table @asis
+@item @emph{Description}
+This function updates the device copy from the previously mapped host memory.
+The host memory is specified with the host address @var{a} and a length of
+@var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.23.
+@end table
+
+
+
+@node acc_update_self
+@section @code{acc_update_self} -- Update host memory from mapped device memory.
+@table @asis
+@item @emph{Description}
+This function updates the host copy from the previously mapped device memory.
+The host memory is specified with the host address @var{a} and a length of
+@var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.24.
+@end table
+
+
+
+@node acc_map_data
+@section @code{acc_map_data} -- Map previously allocated device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function maps previously allocated device and host memory. The device
+memory is specified with the device address @var{d}. The host memory is
+specified with the host address @var{h} and a length of @var{len}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.25.
+@end table
+
+
+
+@node acc_unmap_data
+@section @code{acc_unmap_data} -- Unmap device memory from host memory.
+@table @asis
+@item @emph{Description}
+This function unmaps previously mapped device and host memory. The latter
+specified by @var{h}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.26.
+@end table
+
+
+
+@node acc_deviceptr
+@section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
+@table @asis
+@item @emph{Description}
+This function returns the device address that has been mapped to the
+host address specified by @var{h}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.27.
+@end table
+
+
+
+@node acc_hostptr
+@section @code{acc_hostptr} -- Get host pointer associated with specific device address.
+@table @asis
+@item @emph{Description}
+This function returns the host address that has been mapped to the
+device address specified by @var{d}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.28.
+@end table
+
+
+
+@node acc_is_present
+@section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
+@table @asis
+@item @emph{Description}
+This function indicates whether the specified host address in @var{a} and a
+length of @var{len} bytes is present on the device. In C/C++, a non-zero
+value is returned to indicate the presence of the mapped memory on the
+device. A zero is returned to indicate the memory is not mapped on the
+device.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes. If the host
+memory is mapped to device memory, then a @code{true} is returned. Otherwise,
+a @code{false} is return to indicate the mapped memory is not present.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_is_present(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{logical acc_is_present}
+@item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@item                   @tab @code{logical acc_is_present}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.29.
+@end table
+
+
+
+@node acc_memcpy_to_device
+@section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
+@table @asis
+@item @emph{Description}
+This function copies host memory specified by host address of @var{src} to
+device memory specified by the device address @var{dest} for a length of
+@var{bytes} bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.30.
+@end table
+
+
+
+@node acc_memcpy_from_device
+@section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function copies host memory specified by host address of @var{src} from
+device memory specified by the device address @var{dest} for a length of
+@var{bytes} bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3.2.31.
+@end table
+
+
+
+@node acc_get_current_cuda_device
+@section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA device handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.1.
+@end table
+
+
+
+@node acc_get_current_cuda_context
+@section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA context handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_get_current_cuda_context(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.2.
+@end table
+
+
+
+@node acc_get_cuda_stream
+@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA stream handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_get_cuda_stream(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.3.
+@end table
+
+
+
+@node acc_set_cuda_stream
+@section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
+@table @asis
+@item @emph{Description}
+This function associates the stream handle specified by @var{stream} with
+the asynchronous value specified by @var{async}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_cuda_stream(int async void *stream);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+A.2.1.4.
+@end table
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Environment Variables
+@c ---------------------------------------------------------------------
+
+@node OpenACC Environment Variables
+@chapter OpenACC Environment Variables
+
+The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
+are defined by section 4 of the OpenACC specification in version 2.0.
+The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
+
+@menu
+* ACC_DEVICE_TYPE::
+* ACC_DEVICE_NUM::
+* GCC_ACC_NOTIFY::
+@end menu
+
+
+
+@node ACC_DEVICE_TYPE
+@section @code{ACC_DEVICE_TYPE}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+4.1.
+@end table
+
+
+
+@node ACC_DEVICE_NUM
+@section @code{ACC_DEVICE_NUM}
+@table @asis
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+4.2.
+@end table
+
+
+
+@node GCC_ACC_NOTIFY
+@section @code{GCC_ACC_NOTIFY}
+@table @asis
+@item @emph{Description}:
+Print debug information pertaining to the accelerator.
+@end table
+
+
+
+@c ---------------------------------------------------------------------
+@c CUDA Streams Usage
+@c ---------------------------------------------------------------------
+
+@node CUDA Streams Usage
+@chapter CUDA Streams Usage
+
+This applies to the @code{nvptx} plugin only.
+
+The library provides elements that perform asynchronous movement of
+data and asynchronous operation of computing constructs.  This
+asynchronous functionality is implemented by making use of CUDA
+streams@footnote{See "Stream Management" in "CUDA Driver API",
+TRM-06703-001, Version 5.5, for additional information}.
+
+The primary means by that the asychronous functionality is accessed
+is through the use of those OpenACC directives which make use of the
+@code{async} and @code{wait} clauses.  When the @code{async} clause is
+first used with a directive, it creates a CUDA stream.  If an
+@code{async-argument} is used with the @code{async} clause, then the
+stream is associated with the specified @code{async-argument}.
+
+Following the creation of an association between a CUDA stream and the
+@code{async-argument} of an @code{async} clause, both the @code{wait}
+clause and the @code{wait} directive can be used.  When either the
+clause or directive is used after stream creation, it creates a
+rendezvous point whereby execution waits until all operations
+associated with the @code{async-argument}, that is, stream, have
+completed.
+
+Normally, the management of the streams that are created as a result of
+using the @code{async} clause, is done without any intervention by the
+caller.  This implies the association between the @code{async-argument}
+and the CUDA stream will be maintained for the lifetime of the program.
+However, this association can be changed through the use of the library
+function @code{acc_set_cuda_stream}.  When the function
+@code{acc_set_cuda_stream} is called, the CUDA stream that was
+originally associated with the @code{async} clause will be destroyed.
+Caution should be taken when changing the association as subsequent
+references to the @code{async-argument} refer to a different
+CUDA stream.
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Library Interoperability
+@c ---------------------------------------------------------------------
+
+@node OpenACC Library Interoperability
+@chapter OpenACC Library Interoperability
+
+@section Introduction
+
+The OpenACC library uses the CUDA Driver API, and may interact with
+programs that use the Runtime library directly, or another library
+based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
+"Interactions with the CUDA Driver API" in
+"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
+Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
+for additional information on library interoperability.}.
+This chapter describes the use cases and what changes are
+required in order to use both the OpenACC library and the CUBLAS and Runtime
+libraries within a program.
+
+@section First invocation: NVIDIA CUBLAS library API
+
+In this first use case (see below), a function in the CUBLAS library is called
+prior to any of the functions in the OpenACC library. More specifically, the
+function @code{cublasCreate()}.
+
+When invoked, the function initializes the library and allocates the
+hardware resources on the host and the device on behalf of the caller. Once
+the initialization and allocation has completed, a handle is returned to the
+caller. The OpenACC library also requires initialization and allocation of
+hardware resources. Since the CUBLAS library has already allocated the
+hardware resources for the device, all that is left to do is to initialize
+the OpenACC library and acquire the hardware resources on the host.
+
+Prior to calling the OpenACC function that initializes the library and
+allocate the host hardware resources, you need to acquire the device number
+that was allocated during the call to @code{cublasCreate()}. The invoking of the
+runtime library function @code{cudaGetDevice()} accomplishes this. Once
+acquired, the device number is passed along with the device type as
+parameters to the OpenACC library function @code{acc_set_device_num()}.
+
+Once the call to @code{acc_set_device_num()} has completed, the OpenACC
+library uses the  context that was created during the call to
+@code{cublasCreate()}. In other words, both libraries will be sharing the
+same context.
+
+@smallexample
+    /* Create the handle */
+    s = cublasCreate(&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    @{
+        fprintf(stderr, "cublasCreate failed %d\n", s);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Get the device number */
+    e = cudaGetDevice(&dev);
+    if (e != cudaSuccess)
+    @{
+        fprintf(stderr, "cudaGetDevice failed %d\n", e);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Initialize OpenACC library and use device 'dev' */
+    acc_set_device_num(dev, acc_device_nvidia);
+
+@end smallexample
+@center Use Case 1 
+
+@section First invocation: OpenACC library API
+
+In this second use case (see below), a function in the OpenACC library is
+called prior to any of the functions in the CUBLAS library. More specificially,
+the function @code{acc_set_device_num()}.
+
+In the use case presented here, the function @code{acc_set_device_num()}
+is used to both initialize the OpenACC library and allocate the hardware
+resources on the host and the device. In the call to the function, the
+call parameters specify which device to use and what device
+type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
+is but one method to initialize the OpenACC library and allocate the
+appropriate hardware resources. Other methods are available through the
+use of environment variables and these will be discussed in the next section.
+
+Once the call to @code{acc_set_device_num()} has completed, other OpenACC
+functions can be called as seen with multiple calls being made to
+@code{acc_copyin()}. In addition, calls can be made to functions in the
+CUBLAS library. In the use case a call to @code{cublasCreate()} is made
+subsequent to the calls to @code{acc_copyin()}.
+As seen in the previous use case, a call to @code{cublasCreate()}
+initializes the CUBLAS library and allocates the hardware resources on the
+host and the device.  However, since the device has already been allocated,
+@code{cublasCreate()} will only initialize the CUBLAS library and allocate
+the appropriate hardware resources on the host. The context that was created
+as part of the OpenACC initialization is shared with the CUBLAS library,
+similarly to the first use case.
+
+@smallexample
+    dev = 0;
+
+    acc_set_device_num(dev, acc_device_nvidia);
+
+    /* Copy the first set to the device */
+    d_X = acc_copyin(&h_X[0], N * sizeof (float));
+    if (d_X == NULL)
+    @{ 
+        fprintf(stderr, "copyin error h_X\n");
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Copy the second set to the device */
+    d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
+    if (d_Y == NULL)
+    @{ 
+        fprintf(stderr, "copyin error h_Y1\n");
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Create the handle */
+    s = cublasCreate(&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    @{
+        fprintf(stderr, "cublasCreate failed %d\n", s);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Perform saxpy using CUBLAS library function */
+    s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    @{
+        fprintf(stderr, "cublasSaxpy failed %d\n", s);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Copy the results from the device */
+    acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
+
+@end smallexample
+@center Use Case 2
+
+@section OpenACC library and environment variables
+
+There are two environment variables associated with the OpenACC library
+that may be used to control the device type and device number:
+@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respecively. These two
+environement variables can be used as an alternative to calling
+@code{acc_set_device_num()}. As seen in the second use case, the device
+type and device number were specified using @code{acc_set_device_num()}.
+If however, the aforementioned environment variables were set, then the
+call to @code{acc_set_device_num()} would not be required.
+
+
+The use of the environment variables is only relevant when an OpenACC function
+is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
+is called prior to a call to an OpenACC function, then you must call
+@code{acc_set_device_num()}@footnote{More complete information
+about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
+sections 4.1 and 4.2 of the @uref{http://www.openacc.org/, OpenACC}
+Application Programming Interface”, Version 2.0.}
+
+
+
+@c ---------------------------------------------------------------------
 @c The libgomp ABI
 @c ---------------------------------------------------------------------
 
@@ -1814,6 +3090,7 @@
 * Implementing ORDERED construct::
 * Implementing SECTIONS construct::
 * Implementing SINGLE construct::
+* Implementing OpenACC's PARALLEL construct::
 @end menu
 
 
@@ -2178,6 +3455,15 @@
 
 
 
+@node Implementing OpenACC's PARALLEL construct
+@section Implementing OpenACC's PARALLEL construct
+
+@smallexample
+  void GOACC_parallel ()
+@end smallexample
+
+
+
 @c ---------------------------------------------------------------------
 @c Reporting Bugs
 @c ---------------------------------------------------------------------

Reply via email to