Gentle ping and is this patch still suitable for stage 3?  Thanks.

[PATCH v7 2/2] Don't move cold code out of loop by checking bb count

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583911.html



On 2021/11/10 11:08, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/11/4 21:00, Richard Biener wrote:
>> On Wed, Nov 3, 2021 at 2:29 PM Xionghu Luo <luo...@linux.ibm.com> wrote:
>>>
>>>
>>>> +  while (outmost_loop != loop)
>>>> +    {
>>>> +      if (bb_colder_than_loop_preheader (loop_preheader_edge
>>>> (outmost_loop)->src,
>>>> +                                        loop_preheader_edge 
>>>> (cold_loop)->src))
>>>> +       cold_loop = outmost_loop;
>>>> +      outmost_loop = superloop_at_depth (loop, loop_depth (outmost_loop) 
>>>> + 1);
>>>> +    }
>>>>
>>>> could be instead written as
>>>>
>>>>   coldest_loop = coldest_outermost_loop[loop->num];
>>>>   if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
>>>>     return outermost_loop;
>>>>   return coldest_loop;
>>>>
>>>> ?  And in the usual case coldest_outermost_loop[L] would be the loop tree 
>>>> root.
>>>> It should be possible to compute such cache in a DFS walk of the loop tree
>>>> (the loop iterator by default visits in such order).
>>>
>>>
>>> Thanks.  Updated the patch with your suggestion.  Not sure whether it 
>>> strictly
>>> conforms to your comments.  Though the patch passed all my added 
>>> tests(coverage not enough),
>>> I am still a bit worried if pre-computed coldest_loop is outside of 
>>> outermost_loop, but
>>> outermost_loop is not the COLDEST LOOP, i.e. (outer->inner)
>>>
>>>  [loop tree root, coldest_loop, outermost_loop,..., second_coldest_loop, 
>>> ..., loop],
>>>
>>> then function find_coldest_out_loop will return a loop NOT accord with our
>>> expectation, that should return second_coldest_loop instead of 
>>> outermost_loop?
>> Hmm, interesting - yes.  I guess the common case will be that the 
>> pre-computed
>> outermost loop will be the loop at depth 1 since outer loops tend to
>> be colder than
>> inner loops?  That would then defeat the whole exercise.
> 
> It is not easy to construct such cases, But finally I got below results,
> 
> 1) many cases inner loop is hotter than outer loop, for example:
> 
> loop 1's coldest_outermost_loop is 1, colder_than_inner_loop is NULL
> loop 2's coldest_outermost_loop is 1, colder_than_inner_loop is 1
> loop 3's coldest_outermost_loop is 1, colder_than_inner_loop is 2
> loop 4's coldest_outermost_loop is 1, colder_than_inner_loop is 2
> 
> 
> 2) But there are also cases inner loop is colder than outer loop, like:
> 
> loop 1's coldest outermost loop is 1, colder_than_inner_loop is NULL
> loop 2's coldest outermost loop is 2, colder_than_inner_loop is NULL
> loop 3's coldest outermost loop is 3, colder_than_inner_loop is NULL
> 
> 
>>
>> To optimize the common case but not avoiding iteration in the cases we care
>> about we could instead cache the next outermost loop that is _not_ colder
>> than loop.  So for your [ ... ] example above we'd have> 
>> hotter_than_inner_loop[loop] == outer (second_coldest_loop), where the
>> candidate would then be 'second_coldest_loop' and we'd then iterate
>> to hotter_than_inner_loop[hotter_than_inner_loop[loop]] to find the next
>> cold candidate we can compare against?  For the common case we'd
>> have hotter_than_inner_loop[looo] == NULL (no such loop) and we then
>> simply pick 'outermost_loop'.
> 
> Thanks.  It was difficult to understand, but finally I got to know what you
> want to express :)
> 
> We should cache the next loop that is *colder* than loop instead of '_not_ 
> colder
> than loop', and 'hotter_than_inner_loop' should be 'colder_than_inner_loop',
> then it makes sense if the coldest loop is outside of outermost loop, 
> continue to
> find a colder loop between outermost loop and current loop in
> colder_than_inner_loop[loop->num]?  Hope I understood you correctly...
> 
>>
>> One comment on the patch itself below.
>>
> 
> The loop in fill_cold_out_loop is also removed in the updated v7 patch.
> 
> 
> 
> [PATCH v7 2/2] Don't move cold code out of loop by checking bb count
> 
> From: Xiong Hu Luo <luo...@linux.ibm.com>
> 
> v7 changes:
> 1. Refine get_coldest_out_loop to replace loop with checking
> pre-computed coldest_outermost_loop and colder_than_inner_loop.
> 2. Add function fill_cold_out_loop, compute coldest_outermost_loop and
> colder_than_inner_loop recursively without loop.
> 
> v6 changes:
> 1. Add function fill_coldest_out_loop to pre compute the coldest
> outermost loop for each loop.
> 2. Rename find_coldest_out_loop to get_coldest_out_loop.
> 3. Add testcase ssa-lim-22.c to differentiate with ssa-lim-19.c.
> 
> v5 changes:
> 1. Refine comments for new functions.
> 2. Use basic_block instead of count in bb_colder_than_loop_preheader
> to align with function name.
> 3. Refine with simpler implementation for get_coldest_out_loop and
> ref_in_loop_hot_body::operator for better understanding.
> 
> v4 changes:
> 1. Sort out profile_count comparision to function bb_cold_than_loop_preheader.
> 2. Update ref_in_loop_hot_body::operator () to find cold_loop before compare.
> 3. Split RTL invariant motion part out.
> 4. Remove aux changes.
> 
> v3 changes:
> 1. Handle max_loop in determine_max_movement instead of 
> outermost_invariant_loop.
> 2. Remove unnecessary changes.
> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p.
> 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
> infinite loop when implementing v1 and the iteration is missed to be
> updated actually.
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
> v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580211.html
> v4: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581231.html
> v5: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581961.html
> 
> There was a patch trying to avoid move cold block out of loop:
> 
> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
> 
> Richard suggested to "never hoist anything from a bb with lower execution
> frequency to a bb with higher one in LIM invariantness_dom_walker
> before_dom_children".
> 
> In gimple LIM analysis, add get_coldest_out_loop to move invariants to
> expected target loop, if profile count of the loop bb is colder
> than target loop preheader, it won't be hoisted out of loop.
> Likely for store motion, if all locations of the REF in loop is cold,
> don't do store motion of it.
> 
> SPEC2017 performance evaluation shows 1% performance improvement for
> intrate GEOMEAN and no obvious regression for others.  Especially,
> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
> on P8LE.
> 
> gcc/ChangeLog:
> 
>       * tree-ssa-loop-im.c (bb_colder_than_loop_preheader): New
>       function.
>       (get_coldest_out_loop): New function.
>       (determine_max_movement): Use get_coldest_out_loop.
>       (move_computations_worker): Adjust and fix iteration udpate.
>       (class ref_in_loop_hot_body): New functor.
>       (ref_in_loop_hot_body::operator): New.
>       (can_sm_ref_p): Use for_all_locs_in_loop.
>       (fill_cold_out_loop): New.
>       (tree_ssa_lim_finalize): Free coldest_outermost_loop and
>       colder_than_inner_loop.
>       (loop_invariant_motion_in_fun): Call fill_cold_out_loop.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/tree-ssa/recip-3.c: Adjust.
>       * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
>       * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>       * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
>       * gcc.dg/tree-ssa/ssa-lim-21.c: New test.
>       * gcc.dg/tree-ssa/ssa-lim-22.c: New test.
> ---
>  gcc/tree-ssa-loop-im.c                     | 140 ++++++++++++++++++++-
>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c    |   2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c |  20 +++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c |  29 +++++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c |  25 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c |  35 ++++++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c |  32 +++++
>  7 files changed, 280 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c
> 
> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> index 4b187c2cdaf..e9b3d0cba93 100644
> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c
> @@ -146,6 +146,11 @@ public:
>  enum dep_kind { lim_raw, sm_war, sm_waw };
>  enum dep_state { dep_unknown, dep_independent, dep_dependent };
> 
> +/* coldest outermost loop for given loop.  */
> +class loop **coldest_outermost_loop;
> +/* colder outer loop nerest to given loop.  */
> +class loop **colder_than_inner_loop;
> +
>  /* Populate the loop dependence cache of REF for LOOP, KIND with STATE.  */
> 
>  static void
> @@ -417,6 +422,53 @@ movement_possibility (gimple *stmt)
>    return ret;
>  }
> 
> +/* Compare the profile count inequality of bb and preheader, it is 
> three-state
> +   as stated in profile-count.h, FALSE is returned if inequality cannot be
> +   decided.  */
> +bool bb_colder_than_loop_preheader (basic_block bb, basic_block preheader)
> +{
> +  gcc_assert (bb && preheader);
> +  return bb->count < preheader->count;
> +}
> +
> +/* Check coldest loop between OUTERMOST_LOOP and LOOP by comparing profile
> +   count.
> +  It does three steps check:
> +  1) Check whether CURR_BB is cold in it's own loop_father, if it is cold, 
> just
> +  return NULL which means it should not be moved out at all;
> +  2) CURR_BB is NOT cold, check if pre-computed COLDEST_LOOP is outside of
> +  OUTERMOST_LOOP, if it is inside of OUTERMOST_LOOP, return the COLDEST_LOOP;
> +  3) If COLDEST_LOOP is outside of OUTERMOST_LOOP, check whether there is a
> +  colder loop between OUTERMOST_LOOP and loop in pre-computed
> +  COLDER_THAN_INNER_LOOP, return it, otherwise return OUTERMOST_LOOP.  */
> +
> +static class loop *
> +get_coldest_out_loop (class loop *outermost_loop, class loop *loop,
> +                   basic_block curr_bb)
> +{
> +  gcc_assert (outermost_loop == loop
> +           || flow_loop_nested_p (outermost_loop, loop));
> +
> +  /* If bb_colder_than_loop_preheader returns false due to three-state
> +    comparision, OUTERMOST_LOOP is returned finally to preserve the behavior.
> +    Otherwise, return the coldest loop between OUTERMOST_LOOP and LOOP.  */
> +  if (curr_bb
> +      && bb_colder_than_loop_preheader (curr_bb,
> +                                     loop_preheader_edge (loop)->src))
> +    return NULL;
> +
> +  class loop *coldest_loop = coldest_outermost_loop[loop->num];
> +  if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
> +    {
> +      if (colder_than_inner_loop[loop->num] != NULL
> +       && loop_depth (outermost_loop)
> +            < loop_depth (colder_than_inner_loop[loop->num]))
> +     return colder_than_inner_loop[loop->num];
> +      return outermost_loop;
> +    }
> +  return coldest_loop;
> +}
> +
>  /* Suppose that operand DEF is used inside the LOOP.  Returns the outermost
>     loop to that we could move the expression using DEF if it did not have
>     other operands, i.e. the outermost loop enclosing LOOP in that the value
> @@ -685,7 +737,9 @@ determine_max_movement (gimple *stmt, bool 
> must_preserve_exec)
>      level = ALWAYS_EXECUTED_IN (bb);
>    else
>      level = superloop_at_depth (loop, 1);
> -  lim_data->max_loop = level;
> +  lim_data->max_loop = get_coldest_out_loop (level, loop, bb);
> +  if (!lim_data->max_loop)
> +    return false;
> 
>    if (gphi *phi = dyn_cast <gphi *> (stmt))
>      {
> @@ -1221,7 +1275,10 @@ move_computations_worker (basic_block bb)
>        /* We do not really want to move conditionals out of the loop; we just
>        placed it here to force its operands to be moved if necessary.  */
>        if (gimple_code (stmt) == GIMPLE_COND)
> -     continue;
> +     {
> +       gsi_next (&bsi);
> +       continue;
> +     }
> 
>        if (dump_file && (dump_flags & TDF_DETAILS))
>       {
> @@ -2887,6 +2944,26 @@ ref_indep_loop_p (class loop *loop, im_mem_ref *ref, 
> dep_kind kind)
>    return indep_p;
>  }
> 
> +class ref_in_loop_hot_body
> +{
> +public:
> +  ref_in_loop_hot_body (loop *loop_) : l (loop_) {}
> +  bool operator () (mem_ref_loc *loc);
> +  class loop *l;
> +};
> +
> +/* Check the coldest loop between loop L and innermost loop.  If there is one
> +   cold loop between L and INNER_LOOP, store motion can be performed, 
> otherwise
> +   no cold loop means no store motion.  get_coldest_out_loop also handles 
> cases
> +   when l is inner_loop.  */
> +bool
> +ref_in_loop_hot_body::operator () (mem_ref_loc *loc)
> +{
> +  basic_block curr_bb = gimple_bb (loc->stmt);
> +  class loop *inner_loop = curr_bb->loop_father;
> +  return get_coldest_out_loop (l, inner_loop, curr_bb);
> +}
> +
> 
>  /* Returns true if we can perform store motion of REF from LOOP.  */
> 
> @@ -2941,6 +3018,12 @@ can_sm_ref_p (class loop *loop, im_mem_ref *ref)
>    if (!ref_indep_loop_p (loop, ref, sm_war))
>      return false;
> 
> +  /* Verify whether the candidate is hot for LOOP.  Only do store motion if 
> the
> +    candidate's profile count is hot.  Statement in cold BB shouldn't be 
> moved
> +    out of it's loop_father.  */
> +  if (!for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body (loop)))
> +    return false;
> +
>    return true;
>  }
> 
> @@ -3153,6 +3236,48 @@ fill_always_executed_in (void)
>      fill_always_executed_in_1 (loop, contains_call);
>  }
> 
> +/* Find the coldest loop preheader for LOOP, also find the nearest colder 
> loop
> +   to LOOP.  Then recursively iterate each inner loop.  */
> +
> +void
> +fill_cold_out_loop (class loop *coldest_loop, class loop *outer_loop,
> +                 class loop *loop)
> +{
> +  class loop *colder_loop = NULL;
> +  if (outer_loop)
> +    {
> +      if (bb_colder_than_loop_preheader (loop_preheader_edge 
> (outer_loop)->src,
> +                                      loop_preheader_edge (loop)->src))
> +     colder_loop = outer_loop;
> +      else if (colder_than_inner_loop[outer_loop->num]
> +            && bb_colder_than_loop_preheader (
> +              loop_preheader_edge (colder_than_inner_loop[outer_loop->num])
> +                ->src,
> +              loop_preheader_edge (loop)->src))
> +     colder_loop = colder_than_inner_loop[outer_loop->num];
> +    }
> +
> +  if (bb_colder_than_loop_preheader (loop_preheader_edge (loop)->src,
> +                                  loop_preheader_edge (coldest_loop)->src))
> +    coldest_loop = loop;
> +
> +  coldest_outermost_loop[loop->num] = coldest_loop;
> +  colder_than_inner_loop[loop->num] = colder_loop;
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf (MSG_NOTE, "loop %d's coldest_outermost_loop is %d, ",
> +                loop->num, coldest_loop->num);
> +      if (colder_loop)
> +     dump_printf (MSG_NOTE, "colder_than_inner_loop is %d\n",
> +                  colder_loop->num);
> +      else
> +     dump_printf (MSG_NOTE, "colder_than_inner_loop is NULL\n");
> +    }
> +
> +  class loop *inner_loop;
> +  for (inner_loop = loop->inner; inner_loop; inner_loop = inner_loop->next)
> +    fill_cold_out_loop (coldest_loop, loop, inner_loop);
> +}
> 
>  /* Compute the global information needed by the loop invariant motion pass.  
> */
> 
> @@ -3237,6 +3362,9 @@ tree_ssa_lim_finalize (void)
>      free_affine_expand_cache (&memory_accesses.ttae_cache);
> 
>    free (bb_loop_postorder);
> +
> +  free (coldest_outermost_loop);
> +  free (colder_than_inner_loop);
>  }
> 
>  /* Moves invariants from loops.  Only "expensive" invariants are moved out --
> @@ -3256,6 +3384,14 @@ loop_invariant_motion_in_fun (function *fun, bool 
> store_motion)
>    /* Fills ALWAYS_EXECUTED_IN information for basic blocks.  */
>    fill_always_executed_in ();
> 
> +  /* Pre-compute coldest outermost loop and nearest colder loop  of each 
> loop.
> +   */
> +  class loop *loop;
> +  coldest_outermost_loop = XNEWVEC (class loop *, number_of_loops (cfun));
> +  colder_than_inner_loop = XNEWVEC (class loop *, number_of_loops (cfun));
> +  for (loop = current_loops->tree_root->inner; loop != NULL; loop = 
> loop->next)
> +    fill_cold_out_loop (loop, NULL, loop);
> +
>    int *rpo = XNEWVEC (int, last_basic_block_for_fn (fun));
>    int n = pre_and_rev_post_order_compute_fn (fun, NULL, rpo, false);
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
> index 638bf38db8c..641c91e719e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
> @@ -23,4 +23,4 @@ float h ()
>       F[0] += E / d;
>  }
> 
> -/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
> +/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
> new file mode 100644
> index 00000000000..7326a230b3f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +volatile int x;
> +void
> +bar (int, char *, char *);
> +void
> +foo (int *a, int n, int k)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i++)
> +    {
> +      if (__builtin_expect (x, 0))
> +     bar (k / 5, "one", "two");
> +      a[i] = k;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "out of loop 1" "lim2" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> new file mode 100644
> index 00000000000..51c1913d003
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +volatile int x;
> +void
> +bar (int, char *, char *);
> +void
> +foo (int *a, int n, int m, int s, int t)
> +{
> +  int i;
> +  int j;
> +  int k;
> +
> +  for (i = 0; i < m; i++) // Loop 1
> +    {
> +      if (__builtin_expect (x, 0))
> +     for (j = 0; j < n; j++) // Loop 2
> +       for (k = 0; k < n; k++) // Loop 3
> +         {
> +           bar (s / 5, "one", "two");
> +           a[t] = s;
> +         }
> +      a[t] = t;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "out of loop 2" 4 "lim2" } } */
> +/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> new file mode 100644
> index 00000000000..bc60a040a70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile  } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +/* Test that `count' is not hoisted out of loop when bb is cold.  */
> +
> +int count;
> +volatile int x;
> +
> +struct obj {
> +  int data;
> +  struct obj *next;
> +
> +} *q;
> +
> +void
> +func (int m)
> +{
> +  struct obj *p;
> +  for (int i = 0; i < m; i++)
> +    if (__builtin_expect (x, 0))
> +      count++;
> +
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2"  }  } 
> */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
> new file mode 100644
> index 00000000000..ffe6f8f699d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile  } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +/* Test that `data' and 'data1' is not hoisted out of inner loop and outer 
> loop
> +   when it is in cold loop.  */
> +
> +int count;
> +volatile int x;
> +
> +struct obj {
> +  int data;
> +  int data1;
> +  struct obj *next;
> +};
> +
> +void
> +func (int m, int n, int k, struct obj *a)
> +{
> +  struct obj *q = a;
> +  for (int j = 0; j < m; j++)
> +    if (__builtin_expect (m, 0))
> +      for (int i = 0; i < m; i++)
> +     {
> +       if (__builtin_expect (x, 0))
> +         {
> +           count++;
> +           q->data += 3; /* Not hoisted out to inner loop. */
> +         }
> +       count += n;
> +       q->data1 += k; /* Not hoisted out to outer loop. */
> +     }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2"  }  } 
> */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c
> new file mode 100644
> index 00000000000..16ba4ceb8ab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +volatile int x;
> +volatile int y;
> +void
> +bar (int, char *, char *);
> +void
> +foo (int *a, int n, int m, int s, int t)
> +{
> +  int i;
> +  int j;
> +  int k;
> +
> +  for (i = 0; i < m; i++) // Loop 1
> +    {
> +      if (__builtin_expect (x, 0))
> +     for (j = 0; j < n; j++) // Loop 2
> +       if (__builtin_expect (y, 0))
> +         for (k = 0; k < n; k++) // Loop 3
> +           {
> +             bar (s / 5, "one", "two");
> +             a[t] = s;
> +           }
> +      a[t] = t;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "out of loop 3" 4 "lim2" } } */
> +/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */
> +
> +
> 

-- 
Thanks,
Xionghu

Reply via email to