Hi!

On Mon, 17 Oct 2016 14:08:44 +0200, Richard Biener <richard.guent...@gmail.com> 
wrote:
> On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
> <tho...@codesourcery.com> wrote:
> > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener 
> > <richard.guent...@gmail.com> wrote:
> >> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> >> <tho...@codesourcery.com> wrote:
> >> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
> >> > <richard.guent...@gmail.com> wrote:
> >> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell <nat...@acm.org> wrote:
> >> >> > On 10/14/16 05:28, Richard Biener wrote:
> >> >> >
> >> >> >> The BB_VISITED flag has indetermined state at the beginning of a 
> >> >> >> pass.
> >> >> >> You have to ensure it is cleared yourself.
> >> >> >
> >> >> >
> >> >> > In that case the openacc (&nvptx?) passes should be modified to clear 
> >> >> > the
> >> >> > flags at their start, rather than at their end.

> >> > OK to commit the following?  Is such a test case appropriate (which would
> >> > have caught this issue right away), in particular the dg-final
> >> > scan-tree-dump line?
> >>
> >> Ugh.  Not worse to what we do in various dwarf scanning I guess.
> >
> > ;-|
> >
> >> Doesn't failure lead to a miscompile eventually?  So you could formulate
> >> this as a dg-do run test with a check for the desired outcome?
> >
> > No, unfortunately.  In this case the error is "benign" such that the
> > OpenACC loop processing machinery will decide to not parallelize loops
> > that ought to be parallelized.
> 
> So you can scan for "loop parallelized" instead?

The dump would still contain the outer loop's "Loop 0(0)" marker, so I'd
have to scan for "Head"/"Tail"/"UNIQUE" or similar instead -- but that
seems likewise fragile (for false negatives), and less useful than
scanning for the complete pattern.

> I fear your pattern
> is quite fragile
> to maintain over time.

Agreed -- but then, that's intentional: my idea for this new test case
has been to have it actually verify the expected OpenACC loop processing,
so it's clear that this pattern will need to be adjusted if changing the
OpenACC loop processing.

> >  This won't generally cause any problem
> > (apart from performance regression, obviously); it just caused problems
> > in a few libgomp test cases that actually at run time test for
> > parallelized execution -- which will/did trigger only with nvptx
> > offloading enabled, which not too many people are testing.  The test case
> > I propose below will trigger also for non-offloading configurations.

On IRC, Segher suggested to 'use {} instead of "" to avoid [all those
backslashes]' -- thanks, done.

commit 88260dc23e752c3e05c6644ee3b653a947714440
Author: Thomas Schwinge <tho...@codesourcery.com>
Date:   Mon Oct 17 15:33:09 2016 +0200

    Clear basic block flags before using BB_VISITED for OpenACC loops processing
    
        gcc/
        * omp-low.c (oacc_loop_discovery): Call clear_bb_flags.
    
        gcc/testsuite/
        * gcc.dg/goacc/loop-processing-1.c: New file.
---
 gcc/omp-low.c                                  |  9 +++++----
 gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 213bf8c..5257d21 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
 static oacc_loop *
 oacc_loop_discovery ()
 {
-  basic_block bb;
+  /* Clear basic block flags, in particular BB_VISITED which we're going to use
+     in the following.  */
+  clear_bb_flags ();
   
   oacc_loop *top = new_oacc_loop_outer (current_function_decl);
   oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
@@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
      that diagnostics come out in an unsurprising order.  */
   top = oacc_loop_sibling_nreverse (top);
 
-  /* Reset the visited flags.  */
-  FOR_ALL_BB_FN (bb, cfun)
-    bb->flags &= ~BB_VISITED;
+  /* Clear basic block flags again, as otherwise IRA will explode later on.  */
+  clear_bb_flags ();
 
   return top;
 }
diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c 
gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
new file mode 100644
index 0000000..619576a
--- /dev/null
+++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -0,0 +1,18 @@
+/* Make sure that OpenACC loop processing happens.  */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
+
+extern int place ();
+
+void vector_1 (int *ary, int size)
+{
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
firstprivate (size)
+  {
+#pragma acc loop gang
+    for (int jx = 0; jx < 1; jx++)
+#pragma acc loop auto
+      for (int ix = 0; ix < size; ix++)
+       ary[ix] = place ();
+  }
+}
+
+/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 
14\(1\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 
20\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 
20\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 
0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK, 
\.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_JOIN, 
\.data_dep\.[0-9_]+, 0\);.*Loop 6\(4\).*\.data_dep\.[0-9_]+ = UNIQUE 
\(OACC_HEAD_MARK, 0, 1, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE 
\(OACC_HEAD_MARK, 0, 1, 6\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, 
\.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE 
\(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = UNIQUE 
\(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);} "oaccdevlow" } } */


Grüße
 Thomas

Reply via email to