Hi!

On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener <richard.guent...@gmail.com> 
wrote:
> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> <tho...@codesourcery.com> wrote:
> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
> > <richard.guent...@gmail.com> wrote:
> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell <nat...@acm.org> wrote:
> >> > On 10/14/16 05:28, Richard Biener wrote:
> >> >
> >> >> The BB_VISITED flag has indetermined state at the beginning of a pass.
> >> >> You have to ensure it is cleared yourself.
> >> >
> >> >
> >> > In that case the openacc (&nvptx?) passes should be modified to clear the
> >> > flags at their start, rather than at their end.
> >
> > The gcc/config/nvptx/nvptx.c handling seems fine -- it explicitly clears
> > BB_VISITED for all basic block it works on.
> >
> >> Yes.  But as I said, I ran into IRA ICEs (somewhere in the testsuite) when 
> >> not
> >> cleaning up after tree-ssa-propagate.c.  So somebody has to fix IRA first.
> >
> > Is there a GCC PR for that, or where are you tracking such issues?
> 
> No, just tracking in my head.

Tsk, tsk...  ;-)


> > OK to commit the following?  Is such a test case appropriate (which would
> > have caught this issue right away), in particular the dg-final
> > scan-tree-dump line?
> 
> Ugh.  Not worse to what we do in various dwarf scanning I guess.

;-|

> Doesn't failure lead to a miscompile eventually?  So you could formulate
> this as a dg-do run test with a check for the desired outcome?

No, unfortunately.  In this case the error is "benign" such that the
OpenACC loop processing machinery will decide to not parallelize loops
that ought to be parallelized.  This won't generally cause any problem
(apart from performance regression, obviously); it just caused problems
in a few libgomp test cases that actually at run time test for
parallelized execution -- which will/did trigger only with nvptx
offloading enabled, which not too many people are testing.  The test case
I propose below will trigger also for non-offloading configurations.

> >     Clear basic block flags before using BB_VISITED for OpenACC loops 
> > processing
> >
> >         gcc/
> >         * omp-low.c (oacc_loop_discovery): Call clear_bb_flags.
> >
> >         gcc/testsuite/
> >         * gcc.dg/goacc/loop-processing-1.c: New file.
> > ---
> >  gcc/omp-low.c                                  |  9 +++++----
> >  gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++++++++++++++++++
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > --- gcc/omp-low.c
> > +++ gcc/omp-low.c
> > @@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
> >  static oacc_loop *
> >  oacc_loop_discovery ()
> >  {
> > -  basic_block bb;
> > +  /* Clear basic block flags, in particular BB_VISITED which we're going 
> > to use
> > +     in the following.  */
> > +  clear_bb_flags ();
> >
> >    oacc_loop *top = new_oacc_loop_outer (current_function_decl);
> >    oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > @@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
> >       that diagnostics come out in an unsurprising order.  */
> >    top = oacc_loop_sibling_nreverse (top);
> >
> > -  /* Reset the visited flags.  */
> > -  FOR_ALL_BB_FN (bb, cfun)
> > -    bb->flags &= ~BB_VISITED;
> > +  /* Clear basic block flags again, as otherwise IRA will explode later 
> > on.  */
> > +  clear_bb_flags ();
> >
> >    return top;
> >  }
> > --- /dev/null
> > +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
> > @@ -0,0 +1,18 @@
> > +/* Make sure that OpenACC loop processing happens.  */
> > +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
> > +
> > +extern int place ();
> > +
> > +void vector_1 (int *ary, int size)
> > +{
> > +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
> > firstprivate (size)
> > +  {
> > +#pragma acc loop gang
> > +    for (int jx = 0; jx < 1; jx++)
> > +#pragma acc loop auto
> > +      for (int ix = 0; ix < size; ix++)
> > +       ary[ix] = place ();
> > +  }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "OpenACC loops.*Loop 0\\\(0\\\).*Loop 
> > 14\\\(1\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 
> > 20\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 
> > 1, 20\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, 
> > \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = 
> > UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 
> > 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, 
> > \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Loop 
> > 6\\\(4\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 
> > 6\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 
> > 1, 6\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, 
> > \\\.data_dep\\\.\[0-9_\]+, 2\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = 
> > UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 
> > 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, 
> > \\\.data_dep\\\.\[0-9_\]+, 2\\\);" "oaccdevlow" } } */


Grüße
 Thomas

Reply via email to