On 2020/12/23 00:53, Richard Biener wrote:
On December 21, 2020 10:03:43 AM GMT+01:00, Xiong Hu Luo <luo...@linux.ibm.com> 
wrote:
Here comes another case that requires run a pass once more, as this is
not the common suggested direction to solve problems, not quite sure
whether it is still a reasonble fix here.  Source code is something
like:

ref = ip + *hslot;
while (ip < in_end - 2) {
  unsigned int len = 2;
  len++;
    for ()   {
      do len++;
      while (len < maxlen && ref[len] == ip[len]); //sink code here.
      break;
    }
  len -= 2;
  ip++;
  ip += len + 1;
  if (ip >= in_end - 2)
    break;
}

Before ivopts, the gimple for inner while loop is xxx.c.172t.slp1:

  <bb 31> [local count: 75120046]:
  # len_160 = PHI <len_161(30), len_189(58)>
  len_189 = len_160 + 1;
  _423 = (sizetype) len_189;
  _424 = ip_229 + _423;
  if (maxlen_186 > len_189)
    goto <bb 32>; [94.50%]
  else
    goto <bb 33>; [5.50%]

  <bb 32> [local count: 70988443]:
  _84 = *_424;
  _86 = ref_182 + _423;
  _87 = *_86;
  if (_84 == _87)
    goto <bb 58>; [94.50%]
  else
    goto <bb 33>; [5.50%]

  <bb 58> [local count: 67084079]:
  goto <bb 31>; [100.00%]

  <bb 33> [local count: 14847855]:
  # len_263 = PHI <len_160(32), len_160(31)>
  # _262 = PHI <_423(32), _423(31)>
  # _264 = PHI <_424(32), _424(31)>
  len_190 = len_263 + 4294967295;
  if (len_190 <= 6)
    goto <bb 34>; [0.00%]
  else
    goto <bb 36>; [100.00%]

Then in ivopts, instructions are updated to xxx.c.174t.ivopts:

  <bb 31> [local count: 75120046]:
  # ivtmp.30_29 = PHI <ivtmp.30_32(30), ivtmp.30_31(58)>
  _34 = (unsigned int) ivtmp.30_29;
  len_160 = _34 + 4294967295;
  _423 = ivtmp.30_29;
  _35 = (unsigned long) ip_229;
  _420 = ivtmp.30_29 + _35;
  _419 = (uint8_t *) _420;
  _424 = _419;
  len_418 = (unsigned int) ivtmp.30_29;
  if (maxlen_186 > len_418)
    goto <bb 32>; [94.50%]
  else
    goto <bb 33>; [5.50%]

  <bb 32> [local count: 70988443]:
  _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_29 * 1];
  ivtmp.30_31 = ivtmp.30_29 + 1;
  _417 = ref_182 + 18446744073709551615;
  _87 = MEM[(uint8_t *)_417 + ivtmp.30_31 * 1];
  if (_84 == _87)
    goto <bb 58>; [94.50%]
  else
    goto <bb 33>; [5.50%]

  <bb 58> [local count: 67084079]:
  goto <bb 31>; [100.00%]

  <bb 33> [local count: 14847855]:
  # len_263 = PHI <len_160(32), len_160(31)>
  # _262 = PHI <_423(32), _423(31)>
  # _264 = PHI <_424(32), _424(31)>
  len_190 = len_263 + 4294967295;
  if (len_190 <= 6)
    goto <bb 34>; [0.00%]
  else
    goto <bb 36>; [100.00%]

Some instructions in BB 31 are not used in the loop and could be sinked
out of loop to reduce the computation, but they are not sinked
throughout all passes later.  Run the sink_code pass once more at least
after fre5 could improve this typical case performance 23% due to few
instructions exausted in loop.
xxx.c.209t.sink2:

Sinking _419 = (uint8_t *) _420;
from bb 31 to bb 89
Sinking _420 = ivtmp.30_29 + _35;
from bb 31 to bb 89
Sinking _35 = (unsigned long) ip_229;
from bb 31 to bb 89
Sinking len_160 = _34 + 4294967295;
from bb 31 to bb 33

I also tested the SPEC2017 performance on P8LE, 544.nab_r is improved
by 2.43%, but no big changes to other cases, GEOMEAN is improved quite
small with 0.25%.

The reason why it should be run after fre5 is fre would do some phi
optimization to expose the optimization.  The patch put it after
pass_modref is due to my guess that some gimple optimizations like
thread_jumps, dse, dce etc. could provide more opportunities for
sinking code.  Not sure it is the correct place to put.  I also
verified this issue exists in both X86 and ARM64.
Any comments?  Thanks.

It definitely should be before uncprop (but context stops there). And yes, 
re-running passes isn't the very, very best thing to do without explaining it 
cannot be done in other ways. Not for late stage 3 anyway.

Richard.


Thanks.  Also tried to implement this in a seperate RTL pass, which
would be better?  I guess this would also be stage1 issues...


Xionghu
From ac6e161a592ea259f2807f40d1021ccbfc3a965f Mon Sep 17 00:00:00 2001
From: Xiong Hu Luo <luo...@linux.ibm.com>
Date: Thu, 4 Mar 2021 05:05:19 -0600
Subject: [PATCH] RTL loop sink pass

This is a rtl loop sink pass that check loop header instructions,
if the instruction's dest is not used in loop and it's source is not
updated after current instruction, then this instruction's result is
not used but calculated in loop each itration, sink it out of loop could
reduce executions theoretically(register pressure is not considered yet).

Number of Instructions sank out of loop when running on P8LE:
1. SPEC2017 int: 371
2. SPEC2017 float: 949
3. bootstrap: 402
4. stage1 libraries: 115
5. regression tests: 4533

Though no obvious performance change for SPEC2017, rtl-sink-2.c could
achieve 10% performance improvement by sinking 4 instructions out of
loop header.  This is a bit like run gimple sink pass twice I pasted
several months ago, is implementing this in RTL better?

https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562352.html

gcc/ChangeLog:

        * Makefile.in:
        * dbgcnt.def (DEBUG_COUNTER):
        * passes.def:
        * timevar.def (TV_TREE_SINK):
        (TV_SINK):
        * tree-pass.h (make_pass_rtl_sink):
        * sink.c: New file.

gcc/testsuite/ChangeLog:

        * gcc.dg/rtl-sink-1.c: New test.
        * gcc.dg/rtl-sink-2.c: New test.
---
 gcc/Makefile.in                   |   1 +
 gcc/dbgcnt.def                    |   1 +
 gcc/passes.def                    |   1 +
 gcc/sink.c                        | 350 ++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/rtl-sink-1.c |  19 ++
 gcc/testsuite/gcc.dg/rtl-sink-2.c | 200 +++++++++++++++++
 gcc/timevar.def                   |   3 +-
 gcc/tree-pass.h                   |   1 +
 8 files changed, 575 insertions(+), 1 deletion(-)
 create mode 100644 gcc/sink.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl-sink-1.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl-sink-2.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a63c5d9cab6..b7ec7970aac 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1334,6 +1334,7 @@ OBJS = \
        cppbuiltin.o \
        cppdefault.o \
        cprop.o \
+       sink.o \
        cse.o \
        cselib.o \
        data-streamer.o \
diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 93e7b4fd30e..c0702650ad3 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -197,6 +197,7 @@ DEBUG_COUNTER (sched_region)
 DEBUG_COUNTER (sel_sched_cnt)
 DEBUG_COUNTER (sel_sched_insn_cnt)
 DEBUG_COUNTER (sel_sched_region_cnt)
+DEBUG_COUNTER (sink)
 DEBUG_COUNTER (sms_sched_loop)
 DEBUG_COUNTER (split_for_sched2)
 DEBUG_COUNTER (store_merging)
diff --git a/gcc/passes.def b/gcc/passes.def
index e9ed3c7bc57..820b54155c5 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -436,6 +436,7 @@ along with GCC; see the file COPYING3.  If not see
       /* Perform loop optimizations.  It might be better to do them a bit
         sooner, but we want the profile feedback to work more
         efficiently.  */
+      NEXT_PASS (pass_rtl_sink);
       NEXT_PASS (pass_loop2);
       PUSH_INSERT_PASSES_WITHIN (pass_loop2)
          NEXT_PASS (pass_rtl_loop_init);
diff --git a/gcc/sink.c b/gcc/sink.c
new file mode 100644
index 00000000000..0c33f0e7c8b
--- /dev/null
+++ b/gcc/sink.c
@@ -0,0 +1,350 @@
+/* Code sink for RTL.
+   Copyright (C) 1997-2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "cfghooks.h"
+#include "df.h"
+#include "insn-config.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "recog.h"
+#include "diagnostic-core.h"
+#include "toplev.h"
+#include "cfgrtl.h"
+#include "cfganal.h"
+#include "lcm.h"
+#include "cfgcleanup.h"
+#include "cselib.h"
+#include "intl.h"
+#include "tree-pass.h"
+#include "dbgcnt.h"
+#include "cfgloop.h"
+#include "gcse.h"
+#include "loop-unroll.h"
+
+/* Check whether the instruction could be sunk out of loop by checking dest's
+   uss and source's def.  */
+
+static bool
+can_sink_reg_in_loop (loop *loop, rtx_insn *insn, rtx dest, rtx reg)
+{
+  df_ref def, use;
+  unsigned int src_regno, dest_regno, defs_in_loop_count = 0;
+  basic_block bb = BLOCK_FOR_INSN (insn);
+  basic_block use_bb;
+
+  while (GET_CODE (reg) == ZERO_EXTEND || GET_CODE (reg) == SIGN_EXTEND)
+    reg = XEXP (reg, 0);
+
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+
+  if (UNARY_P (reg))
+    return can_sink_reg_in_loop (loop, insn, dest, XEXP (reg, 0));
+
+  if (BINARY_P (reg))
+    {
+      rtx src0 = XEXP (reg, 0);
+      rtx src1 = XEXP (reg, 1);
+
+      if (CONST_INT_P (src1))
+       return can_sink_reg_in_loop (loop, insn, dest, src0);
+      else
+       return can_sink_reg_in_loop (loop, insn, dest, src0)
+              && can_sink_reg_in_loop (loop, insn, dest, src1);
+    }
+
+  if (!REG_P (reg) || HARD_REGISTER_P (reg))
+    return false;
+
+  src_regno = REGNO (reg);
+  dest_regno = REGNO (dest);
+  rtx_insn *use_insn;
+
+  for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use))
+    {
+      if (!DF_REF_INSN_INFO (use))
+       continue;
+
+      use_insn = DF_REF_INSN (use);
+      use_bb = BLOCK_FOR_INSN (use_insn);
+
+      /* Ignore instruction considered for moving.  */
+      if (use_insn == insn)
+       return false;
+
+      /* Don't consider uses in loop.  */
+      if (!use_bb->loop_father
+         || (NONDEBUG_INSN_P (use_insn)
+             && flow_bb_inside_loop_p (loop, use_bb)))
+       return false;
+    }
+
+  rtx_insn *def_insn;
+  basic_block def_bb;
+  /* Check for other defs.  Any other def in the loop might reach a use
+     currently reached by the def in insn.  */
+  for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = DF_REF_NEXT_REG (def))
+    {
+      def_bb = DF_REF_BB (def);
+
+      /* Defs in exit block cannot reach a use they weren't already.  */
+      if (single_succ_p (def_bb))
+       {
+         basic_block def_bb_succ;
+
+         def_bb_succ = single_succ (def_bb);
+         if (!flow_bb_inside_loop_p (loop, def_bb_succ))
+           continue;
+       }
+
+      if (flow_bb_inside_loop_p (loop, def_bb) && ++defs_in_loop_count > 1)
+       return false;
+    }
+
+  for (def = DF_REG_DEF_CHAIN (src_regno); def; def = DF_REF_NEXT_REG (def))
+    {
+      def_bb = DF_REF_BB (def);
+      def_insn = DF_REF_INSN (def);
+      if (!flow_bb_inside_loop_p (loop, def_bb))
+       continue;
+
+      if (def_bb == bb && DF_INSN_LUID (insn) <= DF_INSN_LUID (def_insn))
+       return false;
+
+      if (def_bb != bb && def_bb != loop->latch)
+       return false;
+    }
+
+  auto_vec<edge> edges = get_loop_exit_edges (loop);
+  unsigned j;
+  edge e;
+
+  FOR_EACH_VEC_ELT (edges, j, e)
+    if (!dominated_by_p (CDI_DOMINATORS, e->src, bb))
+      return false;
+    else
+      continue;
+  return true;
+}
+
+/* If the instruction's dest is only used by debug instruction in the loop, it
+   need also be sunk out of loop to preserve the debug information.  */
+
+rtx_insn *
+sink_dest_in_debug_insn (rtx dest, rtx_insn *insn_sink_to, basic_block prev_bb)
+{
+  df_ref use;
+  rtx_insn *use_insn;
+  rtx pat;
+  basic_block use_bb;
+
+  unsigned int dest_regno = REGNO (dest);
+  for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use))
+    {
+      if (!DF_REF_INSN_INFO (use))
+       continue;
+
+      use_insn = DF_REF_INSN (use);
+
+      if (NONDEBUG_INSN_P (use_insn))
+       continue;
+
+      use_bb = BLOCK_FOR_INSN (use_insn);
+
+      if (use_bb != prev_bb)
+       continue;
+
+      pat = PATTERN (use_insn);
+      emit_debug_insn_after_noloc (copy_insn (pat), insn_sink_to);
+      return use_insn;
+    }
+  return NULL;
+}
+
+/* If the instructions' dest is not used in loop and the source is not
+   re-defined after current instructions, sink it from loop header to every 
loop
+   exits.  */
+
+static void
+sink_set_reg_in_loop (loop *loop)
+{
+  unsigned i,j;
+  basic_block *body = get_loop_body_in_dom_order (loop);
+  rtx_insn *insn, *insn_sink_to, *dbg_insn;
+  edge e;
+  basic_block bb, new_bb;
+  rtx set, src, dest, pat;
+  auto_vec<edge> edges = get_loop_exit_edges (loop);
+
+  FOR_EACH_VEC_ELT (edges, j, e)
+    if (has_abnormal_or_eh_outgoing_edge_p (e->src))
+      return;
+
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      bb = body[i];
+      if (bb->loop_father == loop && bb_loop_header_p (bb))
+       {
+         FOR_BB_INSNS_REVERSE (bb, insn)
+           {
+             if (!NONDEBUG_INSN_P (insn))
+               continue;
+
+             if (CALL_P (insn))
+               continue;
+
+             set = single_set (insn);
+             if (!set || side_effects_p (set))
+               continue;
+
+             src = SET_SRC (set);
+             dest = SET_DEST (set);
+             pat = PATTERN (insn);
+
+             if (!REG_P (dest))
+               continue;
+
+             if (!can_sink_reg_in_loop (loop, insn, dest, src))
+               continue;
+
+             FOR_EACH_VEC_ELT (edges, j, e)
+               {
+                 new_bb = e->dest;
+                 if (!single_pred_p (new_bb))
+                   new_bb = split_edge (e);
+
+                 if (dump_file)
+                   {
+                     fprintf (dump_file, "Loop %d: sinking ", loop->num);
+                     print_rtl (dump_file, set);
+                     fprintf (dump_file, " from bb %d to bb %d \n", bb->index,
+                              new_bb->index);
+                   }
+
+                 insn_sink_to = BB_HEAD (new_bb);
+                 while (!NOTE_INSN_BASIC_BLOCK_P (insn_sink_to))
+                   insn_sink_to = NEXT_INSN (insn_sink_to);
+                 emit_insn_after_noloc (copy_rtx_if_shared (pat), insn_sink_to,
+                                        new_bb);
+
+                 dbg_insn = sink_dest_in_debug_insn (dest, NEXT_INSN 
(insn_sink_to), bb);
+               }
+
+#if 1
+             static unsigned long counts = 0;
+             FILE *f = fopen 
("/home3/luoxhu/workspace/gcc-git/gcc-master_build/sink.out", "a");
+             fprintf (f, " %ld \n", counts);
+             counts++;
+             fclose (f);
+#endif
+             delete_insn (insn);
+             if (dbg_insn)
+               {
+                 delete_insn (dbg_insn);
+                 dbg_insn = NULL;
+               }
+           }
+       }
+    }
+}
+
+/* Sink the set instructins out of the loops.  */
+
+static unsigned int
+sink_insn_in_loop (void)
+{
+  class loop *loop;
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+    {
+      if (loop->num_nodes <= 1000)
+       sink_set_reg_in_loop (loop);
+    }
+  return 0;
+}
+
+static unsigned int
+execute_rtl_sink (void)
+{
+  delete_unreachable_blocks ();
+  df_set_flags (DF_LR_RUN_DCE);
+  df_note_add_problem ();
+  df_analyze ();
+
+  calculate_dominance_info (CDI_POST_DOMINATORS);
+  calculate_dominance_info (CDI_DOMINATORS);
+
+  loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
+
+  if (dump_file)
+    fprintf (dump_file, "rtl sink begin:\n");
+
+  if (number_of_loops (cfun) > 1)
+    sink_insn_in_loop ();
+
+  if (dump_file)
+    fprintf (dump_file, "rtl sink end:\n");
+
+  loop_optimizer_finalize ();
+  free_dominance_info (CDI_DOMINATORS);
+  free_dominance_info (CDI_POST_DOMINATORS);
+
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_rtl_sink = {
+  RTL_PASS,      /* type */
+  "sink",        /* name */
+  OPTGROUP_NONE,  /* optinfo_flags */
+  TV_SINK,       /* tv_id */
+  PROP_cfglayout, /* properties_required */
+  0,             /* properties_provided */
+  0,             /* properties_destroyed */
+  0,             /* todo_flags_start */
+  TODO_df_finish, /* todo_flags_finish */
+};
+
+class pass_rtl_sink : public rtl_opt_pass
+{
+public:
+  pass_rtl_sink (gcc::context *ctxt) : rtl_opt_pass (pass_data_rtl_sink, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  opt_pass *clone () { return new pass_rtl_sink (m_ctxt); }
+  virtual bool gate (function *) { return optimize > 0 && dbg_cnt (sink); }
+
+  virtual unsigned int execute (function *) { return execute_rtl_sink (); }
+
+}; // class pass_rtl_sink
+
+} // namespace
+
+rtl_opt_pass *
+make_pass_rtl_sink (gcc::context *ctxt)
+{
+  return new pass_rtl_sink (ctxt);
+}
diff --git a/gcc/testsuite/gcc.dg/rtl-sink-1.c 
b/gcc/testsuite/gcc.dg/rtl-sink-1.c
new file mode 100644
index 00000000000..0065c874cef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl-sink-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile  } */
+/* { dg-additional-options " -O2 -fdump-rtl-sink"  } */
+
+int
+liveloop (int n, int *x, int *y)
+{
+  int i;
+  int ret;
+
+  for (i = 0; i < n; ++i)
+  {
+    ret = x[i] + 5;
+    y[i] = ret;
+  }
+  return ret;
+}
+
+/* { dg-final { scan-rtl-dump-times "sinking" 1 "sink" } } */
+
diff --git a/gcc/testsuite/gcc.dg/rtl-sink-2.c 
b/gcc/testsuite/gcc.dg/rtl-sink-2.c
new file mode 100644
index 00000000000..9888070e8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl-sink-2.c
@@ -0,0 +1,200 @@
+/* { dg-do compile  } */
+/* { dg-additional-options " -O3 -fdump-rtl-sink"  } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+
+# define HLOG 16
+#define        MAX_LIT        (1 <<  5)
+typedef uint8_t *LZF_HSLOT;
+typedef LZF_HSLOT LZF_STATE[1 << (HLOG)];
+
+int
+compute_on_bytes (uint8_t *in_data, int in_len, uint8_t *out_data, int out_len)
+{
+  LZF_STATE htab;
+
+  uint8_t *ip = in_data;
+  uint8_t *op = out_data;
+  uint8_t *in_end = ip + in_len;
+  uint8_t *out_end = op + out_len;
+  uint8_t *ref;
+
+  unsigned long off;
+  unsigned int hval;
+  int lit;
+
+  if (!in_len || !out_len)
+    return 0;
+
+  lit = 0;
+  op++;
+  hval = (((ip[0]) << 8) | ip[1]);
+
+  while (ip < in_end - 2)
+    {
+      uint8_t *hslot;
+
+      hval = (((hval) << 8) | ip[2]);
+      hslot = (uint8_t *)(htab + (((hval >> (3 * 8 - 16)) - hval * 5) & ((1 << 
(16)) - 1)));
+
+      ref = *hslot + in_data;
+      *hslot = ip - in_data;
+
+      if (1 && (off = ip - ref - 1) < (1 << 13) && ref > in_data
+         && ref[2] == ip[2]
+         && ((ref[1] << 8) | ref[0]) == ((ip[1] << 8) | ip[0]))
+       {
+#if 1
+          unsigned int len = 2;
+#else
+          unsigned long len = 2;
+#endif
+          unsigned int maxlen = in_end - ip - len;
+          maxlen
+            = maxlen > ((1 << 8) + (1 << 3)) ? ((1 << 8) + (1 << 3)) : maxlen;
+
+          if ((op + 3 + 1 >= out_end) != 0)
+            if (op - !lit + 3 + 1 >= out_end)
+              return 0;
+
+          op[-lit - 1] = lit - 1;
+          op -= !lit;
+
+          for (;;)
+            {
+              if (maxlen > 16)
+                {
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                  len++;
+                  if (ref[len] != ip[len])
+                    break;
+                }
+              do
+                {
+                  len++;
+                }
+              while (len < maxlen && ip[len] == ref[len]);
+              break;
+            }
+
+          len -= 2;
+          ip++;
+
+          if (len < 7)
+            {
+              *op++ = (off >> 8) + (len << 5);
+            }
+          else
+            {
+              *op++ = (off >> 8) + (7 << 5);
+              *op++ = len - 7;
+            }
+          *op++ = off;
+          lit = 0;
+          op++;
+          ip += len + 1;
+
+          if (ip >= in_end - 2)
+            break;
+
+          --ip;
+          --ip;
+
+          hval = (((ip[0]) << 8) | ip[1]);
+          hval = (((hval) << 8) | ip[2]);
+          htab[(((hval >> (3 * 8 - 16)) - hval * 5) & ((1 << (16)) - 1))]
+            = (LZF_HSLOT)(ip - in_data);
+          ip++;
+
+          hval = (((hval) << 8) | ip[2]);
+          htab[(((hval >> (3 * 8 - 16)) - hval * 5) & ((1 << (16)) - 1))]
+            = (LZF_HSLOT)(ip - in_data);
+          ip++;
+        }
+       else
+        {
+          if (op >= out_end)
+            return 0;
+
+          lit++;
+          *op++ = *ip++;
+
+          if (lit == (1 << 5))
+            {
+              op[-lit - 1] = lit - 1;
+              lit = 0;
+              op++;
+            }
+        }
+     }
+   if (op + 3 > out_end) /* at most 3 bytes can be missing here */
+     return 0;
+
+   while (ip < in_end)
+     {
+       lit++;
+       *op++ = *ip++;
+       if (lit == MAX_LIT)
+        {
+          op[-lit - 1] = lit - 1; /* stop run */
+          lit = 0;
+          op++; /* start run */
+        }
+     }
+
+   op[-lit - 1] = lit - 1; /* end run */
+   op -= !lit;            /* undo run if length is zero */
+
+   return op - out_data;
+}
+
+/* { dg-final { scan-rtl-dump-times "sinking" 8 "sink" } } */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 63c0b3306de..d6854a299ab 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -176,7 +176,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES      , "tree split crit 
edges")
 DEFTIMEVAR (TV_TREE_REASSOC          , "tree reassociation")
 DEFTIMEVAR (TV_TREE_PRE                     , "tree PRE")
 DEFTIMEVAR (TV_TREE_FRE                     , "tree FRE")
-DEFTIMEVAR (TV_TREE_SINK             , "tree code sinking")
+DEFTIMEVAR (TV_TREE_SINK            , "tree code sinking")
 DEFTIMEVAR (TV_TREE_PHIOPT          , "tree linearize phis")
 DEFTIMEVAR (TV_TREE_BACKPROP        , "tree backward propagate")
 DEFTIMEVAR (TV_TREE_FORWPROP        , "tree forward propagate")
@@ -248,6 +248,7 @@ DEFTIMEVAR (TV_LOOP_UNROLL           , "loop unrolling")
 DEFTIMEVAR (TV_LOOP_DOLOOP           , "loop doloop")
 DEFTIMEVAR (TV_LOOP_FINI            , "loop fini")
 DEFTIMEVAR (TV_CPROP                 , "CPROP")
+DEFTIMEVAR (TV_SINK                  , "SINK")
 DEFTIMEVAR (TV_PRE                   , "PRE")
 DEFTIMEVAR (TV_HOIST                 , "code hoisting")
 DEFTIMEVAR (TV_LSM                   , "LSM")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 15693fee150..fb4a5aaefda 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -548,6 +548,7 @@ extern rtl_opt_pass *make_pass_rtl_dse1 (gcc::context 
*ctxt);
 extern rtl_opt_pass *make_pass_rtl_dse2 (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_rtl_dse3 (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_rtl_cprop (gcc::context *ctxt);
+extern rtl_opt_pass *make_pass_rtl_sink (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_rtl_pre (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_rtl_hoist (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_rtl_store_motion (gcc::context *ctxt);
-- 
2.27.0.90.geebb51ba8c

Reply via email to