Hi! On Thu, 21 Apr 2016 12:19:31 -0600, Sandra Loosemore <san...@codesourcery.com> wrote: > On 04/21/2016 10:21 AM, Thomas Schwinge wrote: > > + <li>Code will be offloaded onto multiple gangs, but executes with > > + just one worker, and a vector length of 1.</li> > > "will be" (future) vs "executes" (present). Assuming this is all > supposed to describe current behavior, please write consistently in the > present tense.
Thanks for that. I keep getting that wrong... > My only comment on the rest of the patch is that "a kernels region" > sounds like a mistake but I think that is the official terminology? Correct: it's an "OpenACC kernels construct/directive/region". > -Sandra the nit-picky Thanks for the review; OK to commit as follows? And then, should something be added to the "News" section on <https://gcc.gnu.org/> itself, too? (I don't know the policy for that. We didn't suggest that for GCC 5, because at that time we described the support as a "preliminary implementation of the OpenACC 2.0a specification"; now it's much more complete and usable.) Index: htdocs/gcc-6/changes.html =================================================================== RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.75 diff -u -p -r1.75 changes.html --- htdocs/gcc-6/changes.html 21 Apr 2016 15:57:43 -0000 1.75 +++ htdocs/gcc-6/changes.html 22 Apr 2016 09:22:19 -0000 @@ -124,6 +124,52 @@ For more information, see the <!-- .................................................................. --> <h2 id="languages">New Languages and Language specific improvements</h2> +<!-- <ul> + <li> -->Compared to GCC 5, the GCC 6 release series includes a much improved + implementation of the <a href="http://www.openacc.org/">OpenACC 2.0a + specification</a>. Highlights are: + <ul> + <li>In addition to single-threaded host-fallback execution, offloading is + supported for nvptx (Nvidia GPUs) on x86_64 and PowerPC 64-bit + little-endian GNU/Linux host systems. For nvptx offloading, with the + OpenACC parallel construct, the execution model allows for an arbitrary + number of gangs, up to 32 workers, and 32 vectors.</li> + <li>Initial support for parallelized execution of OpenACC kernels + constructs: + <ul> + <li>Parallelization of a kernels region is switched on + by <code>-fopenacc</code> combined with <code>-O2</code> or + higher.</li> + <li>Code is offloaded onto multiple gangs, but executes with just one + worker, and a vector length of 1.</li> + <li>Directives inside a kernels region are not supported.</li> + <li>Loops with reductions can be parallelized.</li> + <li>Only kernels regions with one loop nest are parallelized.</li> + <li>Only the outer-most loop of a loop nest can be parallelized.</li> + <li>Loop nests containing sibling loops are not parallelized.</li> + </ul> + Typically, using the OpenACC parallel construct gives much better + performance, compared to the initial support of the OpenACC kernels + construct. + <li>The <code>device_type</code> clause is not supported. + The <code>bind</code> and <code>nohost</code> clauses are not + supported. The <code>host_data</code> directive is not supported in + Fortran.</li> + <li>Nested parallelism (cf. CUDA dynamic parallelism) is not + supported.</li> + <li>Usage of OpenACC constructs inside multithreaded contexts (such as + created by OpenMP, or pthread programming) is not supported.</li> + <li>If a call to the <code>acc_on_device</code> function has a + compile-time constant argument, the function call evaluates to a + compile-time constant value only for C and C++ but not for + Fortran.</li> + </ul> + See the <a href="https://gcc.gnu.org/wiki/OpenACC">OpenACC</a> + and <a href="https://gcc.gnu.org/wiki/Offloading">Offloading</a> wiki pages + for further information. + <!-- </li> +</ul> --> + <!-- <h3 id="ada">Ada</h3> --> <h3 id="c-family">C family</h3> Grüße Thomas