Hi! On Mon, 17 Oct 2016 15:38:50 +0200, I wrote: > On Mon, 17 Oct 2016 14:08:44 +0200, Richard Biener > <richard.guent...@gmail.com> wrote: > > On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge > > <tho...@codesourcery.com> wrote: > > > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener > > > <richard.guent...@gmail.com> wrote: > > >> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge > > >> <tho...@codesourcery.com> wrote: > > >> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener > > >> > <richard.guent...@gmail.com> wrote: > > >> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell <nat...@acm.org> > > >> >> wrote: > > >> >> > On 10/14/16 05:28, Richard Biener wrote: > > >> >> > > > >> >> >> The BB_VISITED flag has indetermined state at the beginning of a > > >> >> >> pass. > > >> >> >> You have to ensure it is cleared yourself. > > >> >> > > > >> >> > > > >> >> > In that case the openacc (&nvptx?) passes should be modified to > > >> >> > clear the > > >> >> > flags at their start, rather than at their end.
This already is a "conceptual acknowledgement" of my patch, so... > > >> > OK to commit the following? Is such a test case appropriate (which > > >> > would > > >> > have caught this issue right away), in particular the dg-final > > >> > scan-tree-dump line? > > >> > > >> Ugh. Not worse to what we do in various dwarf scanning I guess. > > > > > > ;-| > > > > > >> Doesn't failure lead to a miscompile eventually? So you could formulate > > >> this as a dg-do run test with a check for the desired outcome? > > > > > > No, unfortunately. In this case the error is "benign" such that the > > > OpenACC loop processing machinery will decide to not parallelize loops > > > that ought to be parallelized. > > > > So you can scan for "loop parallelized" instead? > > The dump would still contain the outer loop's "Loop 0(0)" marker, so I'd > have to scan for "Head"/"Tail"/"UNIQUE" or similar instead -- but that > seems likewise fragile (for false negatives), and less useful than > scanning for the complete pattern. > > > I fear your pattern > > is quite fragile > > to maintain over time. > > Agreed -- but then, that's intentional: my idea for this new test case > has been to have it actually verify the expected OpenACC loop processing, > so it's clear that this pattern will need to be adjusted if changing the > OpenACC loop processing. > > > > This won't generally cause any problem > > > (apart from performance regression, obviously); it just caused problems > > > in a few libgomp test cases that actually at run time test for > > > parallelized execution -- which will/did trigger only with nvptx > > > offloading enabled, which not too many people are testing. The test case > > > I propose below will trigger also for non-offloading configurations. > > On IRC, Segher suggested to 'use {} instead of "" to avoid [all those > backslashes]' -- thanks, done. If you don't like the test case as-is (do we need multi-line tree dump scanning, just like we recently got for compiler diagnostics?), can I at least commit the OpenACC loops processing fix? Here is the latest version, simplified after your r241296 IRA vs. BB_VISITED fixes: commit 766cf9959b15a17e17e89a50e905b4c546893823 Author: Thomas Schwinge <tho...@codesourcery.com> Date: Mon Oct 17 15:33:09 2016 +0200 Clear basic block flags before using BB_VISITED for OpenACC loops processing gcc/ * omp-low.c (oacc_loop_discovery): Call clear_bb_flags before, and don't clear BB_VISITED after processing. gcc/testsuite/ * gcc.dg/goacc/loop-processing-1.c: New file. --- gcc/omp-low.c | 8 +++----- gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+), 5 deletions(-) diff --git gcc/omp-low.c gcc/omp-low.c index 77f89d5..3ef796f 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -19236,7 +19236,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop) static oacc_loop * oacc_loop_discovery () { - basic_block bb; + /* Clear basic block flags, in particular BB_VISITED which we're going to use + in the following. */ + clear_bb_flags (); oacc_loop *top = new_oacc_loop_outer (current_function_decl); oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun)); @@ -19245,10 +19247,6 @@ oacc_loop_discovery () that diagnostics come out in an unsurprising order. */ top = oacc_loop_sibling_nreverse (top); - /* Reset the visited flags. */ - FOR_ALL_BB_FN (bb, cfun) - bb->flags &= ~BB_VISITED; - return top; } diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c gcc/testsuite/gcc.dg/goacc/loop-processing-1.c new file mode 100644 index 0000000..619576a --- /dev/null +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c @@ -0,0 +1,18 @@ +/* Make sure that OpenACC loop processing happens. */ +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */ + +extern int place (); + +void vector_1 (int *ary, int size) +{ +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size) + { +#pragma acc loop gang + for (int jx = 0; jx < 1; jx++) +#pragma acc loop auto + for (int ix = 0; ix < size; ix++) + ary[ix] = place (); + } +} + +/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 14\(1\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 20\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 20\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(4\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 6\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);} "oaccdevlow" } } */ Grüße Thomas