On Fri, Nov 22, 2013 at 8:48 AM, Jeff Law <l...@redhat.com> wrote:
>
> Right now jump threads from within a loop which cross the loop header, then
> terminate within the loop are ignored as this may create irreducible loops.
>
> This patch gets the CFG/SSA updating code in good enough shape to handle the
> case that the embedded guys care about.  ie, the coremark FSA/FSM benchmark.
>
> Basically we still ignore most of those threads -- the exception is when we
> find the loop header has a multi-way branch.  In that case we go ahead and
> thread the jump, throw away and rebuild the loop information.  Again, we
> only do this when we're able to remove a multi-way jump at the top of a
> loop.
>
> During a GCC bootstrap I found two instances where this triggers, once in
> GCC itself, one in the java runtime libraries.  I've manually verified that
> we do the right thing in those two instances as well as the FSA/FSM
> optimization.  However, the reality is this code isn't being well exercised.
>
> So (for testing purposes only) I removed the test that the loop header has
> to be a multi-way jump and enabled a follow-on patch which allows us to
> identify many more threads through the loop header to points within the
> current loop.  Not surprisingly the code triggered more often and uncovered
> a couple minor issues that I've addressed.
>
> The bootstrap/regression test for the patch ran fine, as did the regression
> and bootstrap test with various pass limitations removed (with the exception
> of tests which explicitly test that we don't thread in situations where
> it'll create irreducible loops).  This is good.
>
> Installed on the trunk.  One more patch to go before calling and end to my
> own development work for 4.9 :-)
>
>
>
>
>
>
>         * tree-ssa-threadupdate.c: Include tree-cfg.h and tree-pass.h
>         (thread_block_1): Do not cancel jump threads which go from
>         inside a loop, through the header, then back inside the loop.
>         (bb_ends_with_multiway_branch): New function.
>         (thread_through_all_blocks): Handle threading cases which start
>         in a loop through the loop header to a point in the loop.
>
>
>
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index 777fe41..38d4f69 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfgloop.h"
>  #include "hash-table.h"
>  #include "dbgcnt.h"
> +#include "tree-cfg.h"
> +#include "tree-pass.h"
>
>  /* Given a block B, update the CFG and SSA graph to reflect redirecting
>     one or more in-edges to B to instead reach the destination of an
> @@ -815,29 +817,15 @@ thread_block_1 (basic_block bb, bool noloop_only, bool
> joiners)
>             }
>
>           /* Another case occurs when trying to thread through our
> -            own loop header, possibly from inside the loop.
> -
> -            If our loop header is buried in the path, then go ahead
> -            and cancel the jump threading request here.  This likely
> -            will need updating for the FSA/FSM coremark case.
> -
> -            Other cases (BB is the loop header) are handled elsewhere.  */
> +            own loop header, possibly from inside the loop.  We will
> +            thread these later.  */
>           unsigned int i;
>           for (i = 1; i < path->length (); i++)
>             {
>               if ((*path)[i]->e->src == bb->loop_father->header
>                   && (!loop_exit_edge_p (bb->loop_father, e2)
>                       || (*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK))
> -               {
> -                 /* If i != 1, then it's a buried header that will not
> -                    be handled elsehwere.  */
> -                 if (i != 1)
> -                   {
> -                     delete_jump_thread_path (path);
> -                     e->aux = NULL;
> -                   }
> -                 break;
> -               }
> +               break;
>             }
>
>           if (i != path->length ())
> @@ -1554,6 +1542,20 @@ mark_threaded_blocks (bitmap threaded_blocks)
>  }
>
>
> +/* Return TRUE if BB ends with a switch statement or a computed goto.
> +   Otherwise return false.  */
> +static bool
> +bb_ends_with_multiway_branch (basic_block bb ATTRIBUTE_UNUSED)
> +{
> +  gimple stmt = last_stmt (bb);
> +  if (stmt && gimple_code (stmt) == GIMPLE_SWITCH)
> +    return true;
> +  if (stmt && gimple_code (stmt) == GIMPLE_GOTO
> +      && TREE_CODE (gimple_goto_dest (stmt)) == SSA_NAME)
> +    return true;
> +  return false;
> +}
> +
>  /* Walk through all blocks and thread incoming edges to the appropriate
>     outgoing edge for each edge pair recorded in THREADED_EDGES.
>
> @@ -1573,6 +1575,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
>    bitmap_iterator bi;
>    bitmap threaded_blocks;
>    struct loop *loop;
> +  bool totally_clobbered_loops = false;
>
>    /* We must know about loops in order to preserve them.  */
>    gcc_assert (current_loops != NULL);
> @@ -1609,35 +1612,79 @@ thread_through_all_blocks (bool
> may_peel_loop_headers)
>        retval |= thread_through_loop_header (loop, may_peel_loop_headers);
>      }
>
> -  /* Assume we had a jump thread path which went from the latch to the exit
> -     and a path which goes from outside to inside the same loop.
> -
> -     If the latch to exit was handled first, we will thread it and clear
> -     loop->header.
> -
> -     The second path will be ignored by thread_block because we're going
> -     through a loop header.  It will also be ignored by the loop above
> -     because loop->header is NULL.
> -
> -     This results in the second path never being threaded.  The failure
> -     mode is a dangling AUX field.
> -
> -     This is inherently a bit of a pain to fix, so we just walk all the
> -     blocks and all the incoming edges to those blocks and clear their
> -     AUX fields.  */
> +  /* Any jump threading paths that are still attached to edges at this
> +     point must be one of two cases.
> +
> +     First, we could have a jump threading path which went from outside
> +     a loop to inside a loop that was ignored because a prior jump thread
> +     across a backedge was realized (which indirectly causes the loop
> +     above to ignore the latter thread).  We can detect these because the
> +     loop structures will be different and we do not currently try to
> +     optimize this case.
> +
> +     Second, we could be threading across a backedge to a point within the
> +     same loop.  This occurrs for the FSA/FSM optimization and we would
> +     like to optimize it.  However, we have to be very careful as this
> +     may completely scramble the loop structures, with the result being
> +     irreducible loops causing us to throw away our loop structure.
> +
> +     As a compromise for the latter case, if the thread path ends in
> +     a block where the last statement is a multiway branch, then go
> +     ahead and thread it, else ignore it.  */
>    basic_block bb;
> -  edge_iterator ei;
>    edge e;
>    FOR_EACH_BB (bb)
>      {
> -      FOR_EACH_EDGE (e, ei, bb->preds)
> +      /* If we do end up threading here, we can remove elements from
> +        BB->preds.  Thus we can not use the FOR_EACH_EDGE iterator.  */
> +      for (edge_iterator ei = ei_start (bb->preds);
> +          (e = ei_safe_edge (ei));)
>         if (e->aux)
>           {
>             vec<jump_thread_edge *> *path = THREAD_PATH (e);
>
> -           delete_jump_thread_path (path);
> -           e->aux = NULL;
> +           /* Case 1, threading from outside to inside the loop
> +              after we'd already threaded through the header.  */
> +           if ((*path)[0]->e->dest->loop_father
> +               != path->last ()->e->src->loop_father)
> +             {
> +               delete_jump_thread_path (path);
> +               e->aux = NULL;
> +               ei_next (&ei);
> +             }
> +          else if (bb_ends_with_multiway_branch (path->last ()->e->src))
> +             {
> +               /* The code to thread through loop headers may have
> +                  split a block with jump threads attached to it.
> +
> +                  We can identify this with a disjoint jump threading
> +                  path.  If found, just remove it.  */
> +               for (unsigned int i = 0; i < path->length () - 1; i++)
> +                 if ((*path)[i]->e->dest != (*path)[i + 1]->e->src)
> +                   {
> +                     delete_jump_thread_path (path);
> +                     e->aux = NULL;
> +                     ei_next (&ei);
> +                     break;
> +                   }
> +
> +               /* Our path is still valid, thread it.  */
> +               if (e->aux)
> +                 {
> +                   totally_clobbered_loops
> +                     |= thread_block ((*path)[0]->e->dest, false);
> +                   e->aux = NULL;
> +                 }
> +             }
> +          else
> +             {
> +               delete_jump_thread_path (path);
> +               e->aux = NULL;
> +               ei_next (&ei);
> +             }
>           }
> +       else
> +         ei_next (&ei);
>      }
>
>    statistics_counter_event (cfun, "Jumps threaded",
> @@ -1649,7 +1696,32 @@ thread_through_all_blocks (bool
> may_peel_loop_headers)
>    threaded_blocks = NULL;
>    paths.release ();
>
> -  if (retval)
> +  /* If we made changes to the CFG that might have totally messed
> +     up the loop structure, then drop the old loop structure and
> +     rebuild.  */
> +  if (totally_clobbered_loops)
> +    {
> +      /* Release the current loop structures, they are totally
> +        clobbered at this point.  */
> +      loop_optimizer_finalize ();
> +      current_loops = NULL;

This is definitely a no-go and should be immediately reverted.  If you
wreck a particular loop simply mark it for removal by setting
->header and ->latch to NULL.  Loop fixup will then re-discover
it (or really remove it).

With the code above you discard all loops in the function including
meta-information on openmp loops, #pragma ivdeps info, etc.

Richard.

> +      /* Similarly for dominance information.  */
> +      free_dominance_info (CDI_DOMINATORS);
> +      free_dominance_info (CDI_POST_DOMINATORS);
> +
> +      /* Before we can rebuild the loop structures, we need dominators,
> +        which requires no unreachable code.  So remove unreachable code.
> */
> +      delete_unreachable_blocks ();
> +
> +      /* Now rebuild the loop structures.  */
> +      cfun->curr_properties &= ~PROP_loops;
> +      loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> +      cfun->curr_properties |= PROP_loops;
> +      retval = 1;
> +    }

> +
> +  if (retval && current_loops)
>      loops_state_set (LOOPS_NEED_FIXUP);
>
>    return retval;
>

Reply via email to