Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

Kyrill Tkachov Fri, 24 Jul 2015 02:32:20 -0700


On 23/07/15 21:38, Jeff Law wrote:

On 07/13/2015 08:03 AM, Kyrill Tkachov wrote:

2015-07-13  Kyrylo Tkachov <kyrylo.tkac...@arm.com>


      * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
      then_cost, else_cost fields.  Change branch_cost field to unsigned
int.
      (end_ifcvt_sequence): Call set_used_flags on each insn in the
      sequence.
      (noce_simple_bbs): New function.
      (noce_try_move): Bail if basic blocks are not simple.
      (noce_try_store_flag): Likewise.
      (noce_try_store_flag_constants): Likewise.
      (noce_try_addcc): Likewise.
      (noce_try_store_flag_mask): Likewise.
      (noce_try_cmove): Likewise.
      (noce_try_minmax): Likewise.
      (noce_try_abs): Likewise.
      (noce_try_sign_mask): Likewise.
      (noce_try_bitop): Likewise.
      (bbs_ok_for_cmove_arith): New function.
      (noce_emit_all_but_last): Likewise.
      (noce_emit_insn): Likewise.
      (noce_emit_bb): Likewise.
      (noce_try_cmove_arith): Handle non-simple basic blocks.
      (insn_valid_noce_process_p): New function.
      (bb_valid_for_noce_process_p): Likewise.
      (noce_process_if_block): Allow non-simple basic blocks
      where appropriate.


2015-07-13  Kyrylo Tkachov <kyrylo.tkac...@arm.com>

      * gcc.dg/ifcvt-1.c: New test.
      * gcc.dg/ifcvt-2.c: Likewise.
      * gcc.dg/ifcvt-3.c: Likewise.

Thanks,
Kyrill

cheers,


ifcvt.patch


commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd
Author: Kyrylo Tkachov<kyrylo.tkac...@arm.com>
Date:   Wed Jul 8 15:45:04 2015 +0100

      [PATCH][ifcvt] Make non-conditional execution if-conversion more 
aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 31849ee..2f0a228 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1584,6 +1630,96 @@ noce_try_cmove (struct noce_if_info *if_info)
     return FALSE;
   }

+/* Return true iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */
+
+static bool
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+{
+  rtx_insn *a_insn;
+  FOR_BB_INSNS (bb_a, a_insn)
+    {
+      if (!active_insn_p (a_insn))
+       continue;
+
+      rtx sset_a = single_set (a_insn);
+
+      if (!sset_a)
+       return false;
+
+      rtx dest_reg = SET_DEST (sset_a);
+      rtx_insn *b_insn;
+
+      FOR_BB_INSNS (bb_b, b_insn)
+       {
+         if (!active_insn_p (b_insn))
+           continue;
+
+         rtx sset_b = single_set (b_insn);
+
+         if (!sset_b)
+           return false;
+
+         if (reg_referenced_p (dest_reg, sset_b))
+           return false;
+       }
+    }
+
+  return true;
+}

Doesn't this have the potential to get very expensive since you end up
looking at every insn in BB_B for every insn in BB_A?


Yes, the saving grace is that bbs_ok_for_cmove_arith is called after the costs 
checks,
so that the costs in practice reject blocks more than a few instructions long.


Wouldn't it be better to walk BB_A, gathering the set of all the
registers modified, then do a single walk through BB testing for uses of
those registers?


I think so, yes. I'll try that.

Don't you have to be careful with MEMs in both blocks?


bb_valid_for_noce_process_p called earlier in the chain should have
rejected memory operands already.

+
+/* Emit copies of all the active instructions in BB except the last.
+   This is a helper for noce_try_cmove_arith.  */
+
+static void
+noce_emit_all_but_last (basic_block bb)
+{
+  rtx_insn *last = last_active_insn (bb, FALSE);
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (insn != last && active_insn_p (insn))
+       {
+         rtx_insn *to_emit = as_a <rtx_insn *> (copy_rtx (insn));
+
+         emit_insn (PATTERN (to_emit));
+       }
+    }
+}

Won't this create invalid RTL sharing?  RTL has strict rules about nodes
can and can not be shared and I'm pretty sure this blindly shares
everything.


All the if-conversion functions end up calling end_ifcvt_sequence
at the end which seems to unshare the stuff that needs to be unshared.
Is that not enough? (I don't have much experience with this part)


Now we may get away with that because you're going to delete all the
insns from BB.  But that begs the question why not just move the insns
from BB to their new location rather than re-emiting them?


I'll try it out.

+
+/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
+   the resulting insn or NULL if it's not a valid insn.  */
+
+static rtx_insn *
+noce_emit_insn (rtx to_emit)
+{
+  gcc_assert (to_emit);
+  rtx_insn *insn = emit_insn (to_emit);
+
+  if (recog_memoized (insn) < 0)
+    return NULL;
+
+  return insn;
+}
+
+/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
+   and including the penultimate one in BB if it is not simple
+   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
+   insn in the block.  The reason for that is that LAST_INSN may
+   have been modified by the preparation in noce_try_cmove_arith.  */
+
+static bool
+noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
+{
+  if (bb && !simple)
+    noce_emit_all_but_last (bb);
+
+  if (last_insn && !noce_emit_insn (last_insn))
+    return false;
+
+  return true;
+}

Under what conditions can noce_emit_insn fail and what happens to the
insn stream if it does?  It seems to me like the insn stream would be
bogus and we should stop compilation.  Which argues that rather than
returning a bool, we should just assert that the insn is memoized and
remove the check in noce_emit_bb.

Or is it the case that we're emitting onto a sequence that we can just
throw away in the event of a failure?


It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have
not seen it cause any trouble. If it fails then back in
noce_try_cmove_arith we goto end_seq_and_fail which ends
the sequence and throws it away (and cancels if-conversion
down that path), so it should be safe.


What is the meaning of the return value from noce_emit_bb?

+
+  /* We're going to execute one of the basic blocks anyway, so
+     bail out if the most expensive of the two blocks is unacceptable.  */
+  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
+    return FALSE;

I believe you or someone else mentioned that this may not be the right
heuristic since you're paying for one arm regardless.  But we can refine
as necessary.


The current code compares the sum of both the arms against the
branch cost. Now that I think about it a bit more, I think the most
 accurate heuristic would be to compare
the cost of both blocks (in the event we do if-conversion) against the
cost of doing the branch plus the cost of the block that is predicted to
be taken, but we can't get that information accurately, I think.
But you're right, it's easy to implement the heuristic once we decide on one.


   +  FOR_BB_INSNS (test_bb, insn)

+    {
+      if (insn != last_insn)
+       {
+         if (!active_insn_p (insn))
+           continue;
+
+         if (!insn_valid_noce_process_p (insn, cc))
+           goto free_bitmap_and_fail;
+
+         rtx sset = single_set (insn);
+         gcc_assert (sset);
+
+         if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset)))
+           goto free_bitmap_and_fail;

So don't you need to look deeper into SET_SRC to see if it's a mem?   It
might be something like (plus (reg) (mem) no?


I think you're right (though this didn't cause me any problems in testing or 
benchmarking).
I'll recurse into it looking for a MEM.

Thanks for the feedback, I'll respin the patch.

Kyrill


Jeff

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

Reply via email to