Re: Add a new combine pass

Andrew Pinski Sun, 17 Nov 2019 16:47:06 -0800

On Sun, Nov 17, 2019 at 3:35 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> (It's 23:35 local time, so it's still just about stage 1. :-))
>
> While working on SVE, I've noticed several cases in which we fail
> to combine instructions because the combined form would need to be
> placed earlier in the instruction stream than the last of the
> instructions being combined.  This includes one very important
> case in the handling of the first fault register (FFR).
>
> Combine currently requires the combined instruction to live at the same
> location as i3.  I thought about trying to relax that restriction, but it
> would be difficult to do with the current pass structure while keeping
> everything linear-ish time.
>
> So this patch instead goes for an option that has been talked about
> several times over the years: writing a new combine pass that just
> does instruction combination, and not all the other optimisations
> that have been bolted onto combine over time.  E.g. it deliberately
> doesn't do things like nonzero-bits tracking, since that really ought
> to be a separate, more global, optimisation.
>
> This is still far from being a realistic replacement for the even
> the combine parts of the current combine pass.  E.g.:
>
> - it only handles combinations that can be built up from individual
>   two-instruction combinations.
>
> - it doesn't allow new hard register clobbers to be added.
>
> - it doesn't have the special treatment of CC operations.
>
> - etc.
>
> But we have to start somewhere.
>
> On a more positive note, the pass handles things that the current
> combine pass doesn't:
>
> - the main motivating feature mentioned above: it works out where
>   the combined instruction could validly live and moves it there
>   if necessary.  If there are a range of valid places, it tries
>   to pick the best one based on register pressure (although only
>   with a simple heuristic for now).
>
> - once it has combined two instructions, it can try combining the
>   result with both later and earlier code, i.e. it can combine
>   in both directions.
>
> - it tries using REG_EQUAL notes for the final instruction.
>
> - it can parallelise two independent instructions that both read from
>   the same register or both read from memory.
>
> This last feature is useful for generating more load-pair combinations
> on AArch64.  In some cases it can also produce more store-pair combinations,
> but only for consecutive stores.  However, since the pass currently does
> this in a very greedy, peephole way, it only allows load/store-pair
> combinations if the first memory access has a higher alignment than
> the second, i.e. if we can be sure that the combined access is naturally
> aligned.  This should help it to make better decisions than the post-RA
> peephole pass in some cases while not being too aggressive.
>
> The pass is supposed to be linear time without debug insns.
> It only tries a constant number C of combinations per instruction
> and its bookkeeping updates are constant-time.  Once it has combined two
> instructions, it'll try up to C combinations on the result, but this can
> be counted against the instruction that was deleted by the combination
> and so effectively just doubles the constant.  (Note that C depends
> on MAX_RECOG_OPERANDS and the new NUM_RANGE_USERS constant.)
>
> Unfortunately, debug updates via propagate_for_debug are more expensive.
> This could probably be fixed if the pass did more to track debug insns
> itself, but using propagate_for_debug matches combine's behaviour.
>
> The patch adds two instances of the new pass: one before combine and
> one after it.  By default both are disabled, but this can be changed
> using the new 3-bit run-combine param, where:
>
> - bit 0 selects the new pre-combine pass
> - bit 1 selects the main combine pass
> - bit 2 selects the new post-combine pass
>
> The idea is that run-combine=3 can be used to see which combinations
> are missed by the new pass, while run-combine=6 (which I hope to be
> the production setting for AArch64 at -O2+) just uses the new pass
> to mop up cases that normal combine misses.  Maybe in some distant
> future, the pass will be good enough for run-combine=[14] to be a
> realistic option.
>
> I ended up having to add yet another validate_simplify_* routine,
> this time to do the equivalent of:
>
>    newx = simplify_replace_rtx (*loc, old_rtx, new_rtx);
>    validate_change (insn, loc, newx, 1);
>
> but in a more memory-efficient way.  validate_replace_rtx isn't suitable
> because it deliberately only tries simplifications in limited cases:
>
>   /* Do changes needed to keep rtx consistent.  Don't do any other
>      simplifications, as it is not our job.  */
>
> And validate_simplify_insn isn't useful for this case because it works
> on patterns that have already had changes made to them and expects
> those patterns to be valid rtxes.  simplify-replace operations instead
> need to simplify as they go, when the original modes are still to hand.
>
> As far as compile-time goes, I tried compiling optabs.ii at -O2
> with an --enable-checking=release compiler:
>
> run-combine=2 (normal combine):  100.0% (baseline)
> run-combine=4 (new pass only)     98.0%
> run-combine=6 (both passes)      100.3%
>
> where the results are easily outside the noise.  So the pass on
> its own is quicker than combine, but that's not a fair comparison
> when it doesn't do everything combine does.  Running both passes
> only has a slight overhead.
>
> To get a feel for the effect on multiple targets, I did my usual
> bogo-comparison of number of lines of asm for gcc.c-torture, gcc.dg
> and g++.dg, this time comparing run-combine=2 and run-combine=6
> using -O2 -ftree-vectorize:
>
> Target                 Tests   Delta    Best   Worst  Median
> ======                 =====   =====    ====   =====  ======
> aarch64-linux-gnu       3974  -39393   -2275      90      -2
> aarch64_be-linux-gnu    3389  -36683   -2275     165      -2
> alpha-linux-gnu         4154  -62860   -2132     335      -2
> amdgcn-amdhsa           4818    9079   -7987   51850      -2
> arc-elf                 2868  -63710  -18998     286      -1
> arm-linux-gnueabi       4053  -80404  -10019     605      -2
> arm-linux-gnueabihf     4053  -80404  -10019     605      -2
> avr-elf                 3620   38513   -2386   23364       2
> bfin-elf                2691  -32973   -1483    1127      -2
> bpf-elf                 5581  -78105  -11064     113      -3
> c6x-elf                 3915  -31710   -2441    1560      -2
> cr16-elf                6030  192102   -1757   60009      12
> cris-elf                2217  -30794   -1716     294      -2
> csky-elf                2003  -24989   -9999    1468      -2
> epiphany-elf            3345  -19416   -1803    4594      -2
> fr30-elf                3562  -15077   -1921    2334      -1
> frv-linux-gnu           2423  -16589   -1736     999      -1
> ft32-elf                2246  -46337  -15988     433      -2
> h8300-elf               2581  -33553   -1403     168      -2
> hppa64-hp-hpux11.23     3926 -120876  -50134    1056      -2
> i686-apple-darwin       3562  -46851   -1764     310      -2
> i686-pc-linux-gnu       2902   -3639   -4809    6848      -2
> ia64-linux-gnu          2900 -158870  -14006     428      -7
> iq2000-elf              2929  -54690   -2904    2576      -3
> lm32-elf                5265  162519   -1918    8004       5
> m32r-elf                1861  -25296   -2713    1004      -2
> m68k-linux-gnu          2520 -241573  -21879     200      -3
> mcore-elf               2378  -28532   -1810    1635      -2
> microblaze-elf          2782 -137363   -9516    1986      -2
> mipsel-linux-gnu        2443  -38422   -8331     458      -1
> mipsisa64-linux-gnu     2287  -60294  -12214     432      -2
> mmix                    4910 -136549  -13616     599      -2
> mn10300-elf             2944  -29151   -2488     132      -1
> moxie-rtems             1935  -12364   -1002     125      -1
> msp430-elf              2379  -37007   -2163     176      -2
> nds32le-elf             2356  -27551   -2126     163      -1
> nios2-linux-gnu         1572  -44828  -23613      92      -2
> nvptx-none              1014  -17337   -1590      16      -3
> or1k-elf                2724  -92816  -14144      56      -3
> pdp11                   1897  -27296   -1370     534      -2
> powerpc-ibm-aix7.0      2909  -58829  -10026    2001      -2
> powerpc64-linux-gnu     3685  -60551  -12158    2001      -1
> powerpc64le-linux-gnu   3501  -61846  -10024     765      -2
> pru-elf                 1574  -29734  -19998    1718      -1
> riscv32-elf             2357  -22506  -10002   10175      -1
> riscv64-elf             3320  -56777  -10002     226      -2
> rl78-elf                2113 -232328  -18607    4065      -3
> rx-elf                  2800  -38515    -896     491      -2
> s390-linux-gnu          3582  -75626  -12098    3999      -2
> s390x-linux-gnu         3761  -73473  -13748    3999      -2
> sh-linux-gnu            2350  -26401   -1003     522      -2
> sparc-linux-gnu         3279  -49518   -2175    2223      -2
> sparc64-linux-gnu       3849 -123084  -30200    2141      -2
> tilepro-linux-gnu       2737  -35562   -3458    2848      -2
> v850-elf                9002 -169126  -49996      76      -4
> vax-netbsdelf           3325  -57734  -10000    1989      -2
> visium-elf              1860  -17006   -1006    1066      -2
> x86_64-darwin           3278  -48933   -9999    1408      -2
> x86_64-linux-gnu        3008  -43887   -9999    3248      -2
> xstormy16-elf           2497  -26569   -2051      89      -2
> xtensa-elf              2161  -31231   -6910     138      -2
>
> So running both passes does seem to have a significant benefit
> on most targets, but there are some nasty-looking outliers.
> The usual caveat applies: number of lines is a very poor measurement,
> it's just to get a feel.
>
> Bootstrapped & regression-tested on aarch64-linux-gnu and
> x86_64-linux-gnu with both run-combine=3 as the default (so that the new
> pass runs first) and with run-combine=6 as the default (so that the new
> pass runs second).  There were no new execution failures.  A couple of
> guality.exp tests that already failed for most options started failing
> for a couple more.  Enabling the pass fixes the XFAILs in:
>
> gcc.target/aarch64/sve/acle/general/ptrue_pat_[234].c
>
> Inevitably there was some scan-assembler fallout for other tests.
> E.g. in gcc.target/aarch64/vmov_n_1.c:
>
> #define INHIB_OPTIMIZATION asm volatile ("" : : : "memory")
>   ...
>   INHIB_OPTIMIZATION;                                                   \
>   (a) = TEST (test, data_len);                                          \
>   INHIB_OPTIMIZATION;                                                   \
>   (b) = VMOV_OBSCURE_INST (reg_len, data_len, data_type) (&(a));        \
>
> is no longer effective for preventing move (a) from being merged
> into (b), because the pass can merge at the point of (a).  I think
> this is a valid thing to do -- the asm semantics are still satisfied,
> and asm volatile ("" : : : "memory") never acted as a register barrier.
> But perhaps we should deal with this as a special case?


Not really.  I think the testcase should be changed to:
INHIB_OPT_VAR(a)

instead.

Where INHIB_OPT_VAR should be:
#define INHIB_OPT_VAR(a) asm("":"+X"(a));

Since it is obviously not doing the correct testing in the first
place.  Even then, this testcase is huge and really should be broken
up into different testcases.


Thanks,
Andrew

>
> Richard
>
>
> 2019-11-17  Richard Sandiford  <richard.sandif...@arm.com>
>
> gcc/
>         * Makefile.in (OBJS): Add combine2.o
>         * params.opt (--param=run-combine): New option.
>         * doc/invoke.texi: Document it.
>         * tree-pass.h (make_pass_combine2_before): Declare.
>         (make_pass_combine2_after): Likewise.
>         * passes.def: Add them.
>         * timevar.def (TV_COMBINE2): New timevar.
>         * cfgrtl.h (update_cfg_for_uncondjump): Declare.
>         * combine.c (update_cfg_for_uncondjump): Move to...
>         * cfgrtl.c (update_cfg_for_uncondjump): ...here.
>         * simplify-rtx.c (simplify_truncation): Handle comparisons.
>         * recog.h (validate_simplify_replace_rtx): Declare.
>         * recog.c (validate_simplify_replace_rtx_1): New function.
>         (validate_simplify_replace_rtx_uses): Likewise.
>         (validate_simplify_replace_rtx): Likewise.
>         * combine2.c: New file.
>
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     2019-11-14 14:34:27.599783740 +0000
> +++ gcc/Makefile.in     2019-11-17 23:15:31.188500613 +0000
> @@ -1261,6 +1261,7 @@ OBJS = \
>         cgraphunit.o \
>         cgraphclones.o \
>         combine.o \
> +       combine2.o \
>         combine-stack-adj.o \
>         compare-elim.o \
>         context.o \
> Index: gcc/params.opt
> ===================================================================
> --- gcc/params.opt      2019-11-14 14:34:26.339792215 +0000
> +++ gcc/params.opt      2019-11-17 23:15:31.200500531 +0000
> @@ -768,6 +768,10 @@ Use internal function id in profile look
>  Common Joined UInteger Var(param_rpo_vn_max_loop_depth) Init(7) 
> IntegerRange(2, 65536) Param
>  Maximum depth of a loop nest to fully value-number optimistically.
>
> +-param=run-combine=
> +Target Joined UInteger Var(param_run_combine) Init(2) IntegerRange(0, 7) 
> Param
> +Choose which of the 3 available combine passes to run: bit 1 for the main 
> combine pass, bit 0 for an earlier variant of the combine pass, and bit 2 for 
> a later variant of the combine pass.
> +
>  -param=sccvn-max-alias-queries-per-access=
>  Common Joined UInteger Var(param_sccvn_max_alias_queries_per_access) 
> Init(1000) Param
>  Maximum number of disambiguations to perform per memory access.
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi 2019-11-16 10:43:45.597105823 +0000
> +++ gcc/doc/invoke.texi 2019-11-17 23:15:31.200500531 +0000
> @@ -11807,6 +11807,11 @@ in combiner for a pseudo register as las
>  @item max-combine-insns
>  The maximum number of instructions the RTL combiner tries to combine.
>
> +@item run-combine
> +Choose which of the 3 available combine passes to run: bit 1 for the main
> +combine pass, bit 0 for an earlier variant of the combine pass, and bit 2
> +for a later variant of the combine pass.
> +
>  @item integer-share-limit
>  Small integer constants can use a shared data structure, reducing the
>  compiler's memory usage and increasing its speed.  This sets the maximum
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h     2019-10-29 08:29:03.096444049 +0000
> +++ gcc/tree-pass.h     2019-11-17 23:15:31.204500501 +0000
> @@ -562,7 +562,9 @@ extern rtl_opt_pass *make_pass_reginfo_i
>  extern rtl_opt_pass *make_pass_inc_dec (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_stack_ptr_mod (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_initialize_regs (gcc::context *ctxt);
> +extern rtl_opt_pass *make_pass_combine2_before (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_combine (gcc::context *ctxt);
> +extern rtl_opt_pass *make_pass_combine2_after (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_if_after_combine (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_jump_after_combine (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_ree (gcc::context *ctxt);
> Index: gcc/passes.def
> ===================================================================
> --- gcc/passes.def      2019-10-29 08:29:03.224443133 +0000
> +++ gcc/passes.def      2019-11-17 23:15:31.200500531 +0000
> @@ -437,7 +437,9 @@ along with GCC; see the file COPYING3.
>        NEXT_PASS (pass_inc_dec);
>        NEXT_PASS (pass_initialize_regs);
>        NEXT_PASS (pass_ud_rtl_dce);
> +      NEXT_PASS (pass_combine2_before);
>        NEXT_PASS (pass_combine);
> +      NEXT_PASS (pass_combine2_after);
>        NEXT_PASS (pass_if_after_combine);
>        NEXT_PASS (pass_jump_after_combine);
>        NEXT_PASS (pass_partition_blocks);
> Index: gcc/timevar.def
> ===================================================================
> --- gcc/timevar.def     2019-10-11 15:43:53.403498517 +0100
> +++ gcc/timevar.def     2019-11-17 23:15:31.204500501 +0000
> @@ -251,6 +251,7 @@ DEFTIMEVAR (TV_AUTO_INC_DEC          , "
>  DEFTIMEVAR (TV_CSE2                  , "CSE 2")
>  DEFTIMEVAR (TV_BRANCH_PROB           , "branch prediction")
>  DEFTIMEVAR (TV_COMBINE               , "combiner")
> +DEFTIMEVAR (TV_COMBINE2              , "second combiner")
>  DEFTIMEVAR (TV_IFCVT                , "if-conversion")
>  DEFTIMEVAR (TV_MODE_SWITCH           , "mode switching")
>  DEFTIMEVAR (TV_SMS                  , "sms modulo scheduling")
> Index: gcc/cfgrtl.h
> ===================================================================
> --- gcc/cfgrtl.h        2019-03-08 18:15:39.320730391 +0000
> +++ gcc/cfgrtl.h        2019-11-17 23:15:31.192500584 +0000
> @@ -47,6 +47,7 @@ extern void fixup_partitions (void);
>  extern bool purge_dead_edges (basic_block);
>  extern bool purge_all_dead_edges (void);
>  extern bool fixup_abnormal_edges (void);
> +extern void update_cfg_for_uncondjump (rtx_insn *);
>  extern rtx_insn *unlink_insn_chain (rtx_insn *, rtx_insn *);
>  extern void relink_block_chain (bool);
>  extern rtx_insn *duplicate_insn_chain (rtx_insn *, rtx_insn *);
> Index: gcc/combine.c
> ===================================================================
> --- gcc/combine.c       2019-11-13 08:42:45.537368745 +0000
> +++ gcc/combine.c       2019-11-17 23:15:31.192500584 +0000
> @@ -2530,42 +2530,6 @@ reg_subword_p (rtx x, rtx reg)
>          && GET_MODE_CLASS (GET_MODE (x)) == MODE_INT;
>  }
>
> -/* Delete the unconditional jump INSN and adjust the CFG correspondingly.
> -   Note that the INSN should be deleted *after* removing dead edges, so
> -   that the kept edge is the fallthrough edge for a (set (pc) (pc))
> -   but not for a (set (pc) (label_ref FOO)).  */
> -
> -static void
> -update_cfg_for_uncondjump (rtx_insn *insn)
> -{
> -  basic_block bb = BLOCK_FOR_INSN (insn);
> -  gcc_assert (BB_END (bb) == insn);
> -
> -  purge_dead_edges (bb);
> -
> -  delete_insn (insn);
> -  if (EDGE_COUNT (bb->succs) == 1)
> -    {
> -      rtx_insn *insn;
> -
> -      single_succ_edge (bb)->flags |= EDGE_FALLTHRU;
> -
> -      /* Remove barriers from the footer if there are any.  */
> -      for (insn = BB_FOOTER (bb); insn; insn = NEXT_INSN (insn))
> -       if (BARRIER_P (insn))
> -         {
> -           if (PREV_INSN (insn))
> -             SET_NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn);
> -           else
> -             BB_FOOTER (bb) = NEXT_INSN (insn);
> -           if (NEXT_INSN (insn))
> -             SET_PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn);
> -         }
> -       else if (LABEL_P (insn))
> -         break;
> -    }
> -}
> -
>  /* Return whether PAT is a PARALLEL of exactly N register SETs followed
>     by an arbitrary number of CLOBBERs.  */
>  static bool
> @@ -15096,7 +15060,10 @@ const pass_data pass_data_combine =
>    {}
>
>    /* opt_pass methods: */
> -  virtual bool gate (function *) { return (optimize > 0); }
> +  virtual bool gate (function *)
> +    {
> +      return optimize > 0 && (param_run_combine & 2) != 0;
> +    }
>    virtual unsigned int execute (function *)
>      {
>        return rest_of_handle_combine ();
> Index: gcc/cfgrtl.c
> ===================================================================
> --- gcc/cfgrtl.c        2019-10-17 14:22:55.523309009 +0100
> +++ gcc/cfgrtl.c        2019-11-17 23:15:31.188500613 +0000
> @@ -3409,6 +3409,42 @@ fixup_abnormal_edges (void)
>    return inserted;
>  }
>
> +/* Delete the unconditional jump INSN and adjust the CFG correspondingly.
> +   Note that the INSN should be deleted *after* removing dead edges, so
> +   that the kept edge is the fallthrough edge for a (set (pc) (pc))
> +   but not for a (set (pc) (label_ref FOO)).  */
> +
> +void
> +update_cfg_for_uncondjump (rtx_insn *insn)
> +{
> +  basic_block bb = BLOCK_FOR_INSN (insn);
> +  gcc_assert (BB_END (bb) == insn);
> +
> +  purge_dead_edges (bb);
> +
> +  delete_insn (insn);
> +  if (EDGE_COUNT (bb->succs) == 1)
> +    {
> +      rtx_insn *insn;
> +
> +      single_succ_edge (bb)->flags |= EDGE_FALLTHRU;
> +
> +      /* Remove barriers from the footer if there are any.  */
> +      for (insn = BB_FOOTER (bb); insn; insn = NEXT_INSN (insn))
> +       if (BARRIER_P (insn))
> +         {
> +           if (PREV_INSN (insn))
> +             SET_NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn);
> +           else
> +             BB_FOOTER (bb) = NEXT_INSN (insn);
> +           if (NEXT_INSN (insn))
> +             SET_PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn);
> +         }
> +       else if (LABEL_P (insn))
> +         break;
> +    }
> +}
> +
>  /* Cut the insns from FIRST to LAST out of the insns stream.  */
>
>  rtx_insn *
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c  2019-11-16 15:33:36.642840131 +0000
> +++ gcc/simplify-rtx.c  2019-11-17 23:15:31.204500501 +0000
> @@ -851,6 +851,12 @@ simplify_truncation (machine_mode mode,
>        && trunc_int_for_mode (INTVAL (XEXP (op, 1)), mode) == -1)
>      return constm1_rtx;
>
> +  /* (truncate:A (cmp X Y)) is (cmp:A X Y): we can compute the result
> +     in a narrower mode if useful.  */
> +  if (COMPARISON_P (op))
> +    return simplify_gen_relational (GET_CODE (op), mode, VOIDmode,
> +                                   XEXP (op, 0), XEXP (op, 1));
> +
>    return NULL_RTX;
>  }
>
> Index: gcc/recog.h
> ===================================================================
> --- gcc/recog.h 2019-09-09 18:58:28.860430363 +0100
> +++ gcc/recog.h 2019-11-17 23:15:31.204500501 +0000
> @@ -111,6 +111,7 @@ extern int validate_replace_rtx_part_nos
>  extern void validate_replace_rtx_group (rtx, rtx, rtx_insn *);
>  extern void validate_replace_src_group (rtx, rtx, rtx_insn *);
>  extern bool validate_simplify_insn (rtx_insn *insn);
> +extern bool validate_simplify_replace_rtx (rtx_insn *, rtx *, rtx, rtx);
>  extern int num_changes_pending (void);
>  extern int next_insn_tests_no_inequality (rtx_insn *);
>  extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode);
> Index: gcc/recog.c
> ===================================================================
> --- gcc/recog.c 2019-10-01 09:55:35.150088599 +0100
> +++ gcc/recog.c 2019-11-17 23:15:31.204500501 +0000
> @@ -922,6 +922,226 @@ validate_simplify_insn (rtx_insn *insn)
>        }
>    return ((num_changes_pending () > 0) && (apply_change_group () > 0));
>  }
> +
> +/* A subroutine of validate_simplify_replace_rtx.  Apply the replacement
> +   described by R to LOC.  Return true on success; leave the caller
> +   to clean up on failure.  */
> +
> +static bool
> +validate_simplify_replace_rtx_1 (validate_replace_src_data &r, rtx *loc)
> +{
> +  rtx x = *loc;
> +  enum rtx_code code = GET_CODE (x);
> +  machine_mode mode = GET_MODE (x);
> +
> +  if (rtx_equal_p (x, r.from))
> +    {
> +      validate_unshare_change (r.insn, loc, r.to, 1);
> +      return true;
> +    }
> +
> +  /* Recursively apply the substitution and see if we can simplify
> +     the result.  This specifically shouldn't use simplify_gen_*,
> +     since we want to avoid generating new expressions where possible.  */
> +  int old_num_changes = num_validated_changes ();
> +  rtx newx = NULL_RTX;
> +  bool recurse_p = false;
> +  switch (GET_RTX_CLASS (code))
> +    {
> +    case RTX_UNARY:
> +      {
> +       machine_mode op0_mode = GET_MODE (XEXP (x, 0));
> +       if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0)))
> +         return false;
> +
> +       newx = simplify_unary_operation (code, mode, XEXP (x, 0), op0_mode);
> +       break;
> +      }
> +
> +    case RTX_BIN_ARITH:
> +    case RTX_COMM_ARITH:
> +      {
> +       if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0))
> +           || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1)))
> +         return false;
> +
> +       newx = simplify_binary_operation (code, mode,
> +                                         XEXP (x, 0), XEXP (x, 1));
> +       break;
> +      }
> +
> +    case RTX_COMPARE:
> +    case RTX_COMM_COMPARE:
> +      {
> +       machine_mode op_mode = (GET_MODE (XEXP (x, 0)) != VOIDmode
> +                               ? GET_MODE (XEXP (x, 0))
> +                               : GET_MODE (XEXP (x, 1)));
> +       if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0))
> +           || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1)))
> +         return false;
> +
> +       newx = simplify_relational_operation (code, mode, op_mode,
> +                                             XEXP (x, 0), XEXP (x, 1));
> +       break;
> +      }
> +
> +    case RTX_TERNARY:
> +    case RTX_BITFIELD_OPS:
> +      {
> +       machine_mode op0_mode = GET_MODE (XEXP (x, 0));
> +       if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0))
> +           || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1))
> +           || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 2)))
> +         return false;
> +
> +       newx = simplify_ternary_operation (code, mode, op0_mode,
> +                                          XEXP (x, 0), XEXP (x, 1),
> +                                          XEXP (x, 2));
> +       break;
> +      }
> +
> +    case RTX_EXTRA:
> +      if (code == SUBREG)
> +       {
> +         machine_mode inner_mode = GET_MODE (SUBREG_REG (x));
> +         if (!validate_simplify_replace_rtx_1 (r, &SUBREG_REG (x)))
> +           return false;
> +
> +         rtx inner = SUBREG_REG (x);
> +         newx = simplify_subreg (mode, inner, inner_mode, SUBREG_BYTE (x));
> +         /* Reject the same cases that simplify_gen_subreg would.  */
> +         if (!newx
> +             && (GET_CODE (inner) == SUBREG
> +                 || GET_CODE (inner) == CONCAT
> +                 || GET_MODE (inner) == VOIDmode
> +                 || !validate_subreg (mode, inner_mode,
> +                                      inner, SUBREG_BYTE (x))))
> +           return false;
> +         break;
> +       }
> +      else
> +       recurse_p = true;
> +      break;
> +
> +    case RTX_OBJ:
> +      if (code == LO_SUM)
> +       {
> +         if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0))
> +             || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1)))
> +           return false;
> +
> +         /* (lo_sum (high x) y) -> y where x and y have the same base.  */
> +         rtx op0 = XEXP (x, 0);
> +         rtx op1 = XEXP (x, 1);
> +         if (GET_CODE (op0) == HIGH)
> +           {
> +             rtx base0, base1, offset0, offset1;
> +             split_const (XEXP (op0, 0), &base0, &offset0);
> +             split_const (op1, &base1, &offset1);
> +             if (rtx_equal_p (base0, base1))
> +               newx = op1;
> +           }
> +       }
> +      else if (code == REG)
> +       {
> +         if (REG_P (r.from) && reg_overlap_mentioned_p (x, r.from))
> +           return false;
> +       }
> +      else
> +       recurse_p = true;
> +      break;
> +
> +    case RTX_CONST_OBJ:
> +      break;
> +
> +    case RTX_AUTOINC:
> +      if (reg_overlap_mentioned_p (XEXP (x, 0), r.from))
> +       return false;
> +      recurse_p = true;
> +      break;
> +
> +    case RTX_MATCH:
> +    case RTX_INSN:
> +      gcc_unreachable ();
> +    }
> +
> +  if (recurse_p)
> +    {
> +      const char *fmt = GET_RTX_FORMAT (code);
> +      for (int i = 0; fmt[i]; i++)
> +       switch (fmt[i])
> +         {
> +         case 'E':
> +           for (int j = 0; j < XVECLEN (x, i); j++)
> +             if (!validate_simplify_replace_rtx_1 (r, &XVECEXP (x, i, j)))
> +               return false;
> +           break;
> +
> +         case 'e':
> +           if (XEXP (x, i)
> +               && !validate_simplify_replace_rtx_1 (r, &XEXP (x, i)))
> +             return false;
> +           break;
> +         }
> +    }
> +
> +  if (newx && !rtx_equal_p (x, newx))
> +    {
> +      /* There's no longer any point unsharing the substitutions made
> +        for subexpressions, since we'll just copy this one instead.  */
> +      for (int i = old_num_changes; i < num_changes; ++i)
> +       changes[i].unshare = false;
> +      validate_unshare_change (r.insn, loc, newx, 1);
> +    }
> +
> +  return true;
> +}
> +
> +/* A note_uses callback for validate_simplify_replace_rtx.
> +   DATA points to a validate_replace_src_data object.  */
> +
> +static void
> +validate_simplify_replace_rtx_uses (rtx *loc, void *data)
> +{
> +  validate_replace_src_data &r = *(validate_replace_src_data *) data;
> +  if (r.insn && !validate_simplify_replace_rtx_1 (r, loc))
> +    r.insn = NULL;
> +}
> +
> +/* Try to perform the equivalent of:
> +
> +      newx = simplify_replace_rtx (*loc, OLD_RTX, NEW_RTX);
> +      validate_change (INSN, LOC, newx, 1);
> +
> +   but without generating as much garbage rtl when the resulting
> +   pattern doesn't match.
> +
> +   Return true if we were able to replace all uses of OLD_RTX in *LOC
> +   and if the result conforms to general rtx rules (e.g. for whether
> +   subregs are meaningful).
> +
> +   When returning true, add all replacements to the current validation group,
> +   leaving the caller to test it in the normal way.  Leave both *LOC and the
> +   validation group unchanged on failure.  */
> +
> +bool
> +validate_simplify_replace_rtx (rtx_insn *insn, rtx *loc,
> +                              rtx old_rtx, rtx new_rtx)
> +{
> +  validate_replace_src_data r;
> +  r.from = old_rtx;
> +  r.to = new_rtx;
> +  r.insn = insn;
> +
> +  unsigned int num_changes = num_validated_changes ();
> +  note_uses (loc, validate_simplify_replace_rtx_uses, &r);
> +  if (!r.insn)
> +    {
> +      cancel_changes (num_changes);
> +      return false;
> +    }
> +  return true;
> +}
>
>  /* Return 1 if the insn using CC0 set by INSN does not contain
>     any ordered tests applied to the condition codes.
> Index: gcc/combine2.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/combine2.c      2019-11-17 23:15:31.196500559 +0000
> @@ -0,0 +1,1576 @@
> +/* Combine instructions
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "df.h"
> +#include "tree-pass.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "insn-config.h"
> +#include "recog.h"
> +#include "print-rtl.h"
> +#include "rtl-iter.h"
> +#include "predict.h"
> +#include "cfgcleanup.h"
> +#include "cfghooks.h"
> +#include "cfgrtl.h"
> +#include "alias.h"
> +#include "valtrack.h"
> +
> +/* This pass tries to combine instructions in the following ways:
> +
> +   (1) If we have two dependent instructions:
> +
> +        I1: (set DEST1 SRC1)
> +        I2: (...DEST1...)
> +
> +       and I2 is the only user of DEST1, the pass tries to combine them into:
> +
> +        I2: (...SRC1...)
> +
> +   (2) If we have two dependent instructions:
> +
> +        I1: (set DEST1 SRC1)
> +        I2: (...DEST1...)
> +
> +       the pass tries to combine them into:
> +
> +        I2: (parallel [(set DEST1 SRC1) (...SRC1...)])
> +
> +       or:
> +
> +        I2: (parallel [(...SRC1...) (set DEST1 SRC1)])
> +
> +   (3) If we have two independent instructions:
> +
> +        I1: (set DEST1 SRC1)
> +        I2: (set DEST2 SRC2)
> +
> +       that read from memory or from the same register, the pass tries to
> +       combine them into:
> +
> +        I2: (parallel [(set DEST1 SRC1) (set DEST2 SRC2)])
> +
> +       or:
> +
> +        I2: (parallel [(set DEST2 SRC2) (set DEST1 SRC1)])
> +
> +   If the combined form is a valid instruction, the pass tries to find a
> +   place between I1 and I2 inclusive for the new instruction.  If there
> +   are multiple valid locations, it tries to pick the best one by taking
> +   the effect on register pressure into account.
> +
> +   If a combination succeeds and produces a single set, the pass tries to
> +   combine the new form with earlier or later instructions.
> +
> +   The pass currently optimizes each basic block separately.  It walks
> +   the instructions in reverse order, building up live ranges for registers
> +   and memory.  It then uses these live ranges to look for possible
> +   combination opportunities and to decide where the combined instructions
> +   could be placed.
> +
> +   The pass represents positions in the block using point numbers,
> +   with higher numbers indicating earlier instructions.  The numbering
> +   scheme is that:
> +
> +   - the end of the current instruction sequence has an even base point B.
> +
> +   - instructions initially have odd-numbered points B + 1, B + 3, etc.
> +     with B + 1 being the final instruction in the sequence.
> +
> +   - even points after B represent gaps between instructions where combined
> +     instructions could be placed.
> +
> +   Thus even points initially represent no instructions and odd points
> +   initially represent single instructions.  However, when picking a
> +   place for a combined instruction, the pass may choose somewhere
> +   inbetween the original two instructions, so that over time a point
> +   may come to represent several instructions.  When this happens,
> +   the pass maintains the invariant that all instructions with the same
> +   point number are independent of each other and thus can be treated as
> +   acting in parallel (or as acting in any arbitrary sequence).
> +
> +   TODOs:
> +
> +   - Handle 3-instruction combinations, and possibly more.
> +
> +   - Handle existing clobbers more efficiently.  At the moment we can't
> +     move an instruction that clobbers R across another instruction that
> +     clobbers R.
> +
> +   - Allow hard register clobbers to be added, like combine does.
> +
> +   - Perhaps work on EBBs, or SESE regions.  */
> +
> +namespace {
> +
> +/* The number of explicit uses to record in a live range.  */
> +const unsigned int NUM_RANGE_USERS = 4;
> +
> +/* The maximum number of instructions that we can combine at once.  */
> +const unsigned int MAX_COMBINE_INSNS = 2;
> +
> +/* A fake cost for instructions that we haven't costed yet.  */
> +const unsigned int UNKNOWN_COST = ~0U;
> +
> +class combine2
> +{
> +public:
> +  combine2 (function *);
> +  ~combine2 ();
> +
> +  void execute ();
> +
> +private:
> +  struct insn_info_rec;
> +
> +  /* Describes the live range of a register or of memory.  For simplicity,
> +     we treat memory as a single entity.
> +
> +     If we had a fully-accurate live range, updating it to account for a
> +     moved instruction would be a linear-time operation.  Doing this for
> +     each combination would then make the pass quadratic.  We therefore
> +     just maintain a list of NUM_RANGE_USERS use insns and use simple,
> +     conservatively-correct behavior for the rest.  */
> +  struct live_range_rec
> +  {
> +    /* Which instruction provides the dominating definition, or null if
> +       we don't know yet.  */
> +    insn_info_rec *producer;
> +
> +    /* A selection of instructions that use the resource, in program order.  
> */
> +    insn_info_rec *users[NUM_RANGE_USERS];
> +
> +    /* An inclusive range of points that covers instructions not mentioned
> +       in USERS.  Both values are zero if there are no such instructions.
> +
> +       Once we've included a use U at point P in this range, we continue
> +       to assume that some kind of use exists at P whatever happens to U
> +       afterwards.  */
> +    unsigned int first_extra_use;
> +    unsigned int last_extra_use;
> +
> +    /* The register number this range describes, or INVALID_REGNUM
> +       for memory.  */
> +    unsigned int regno;
> +
> +    /* Forms a linked list of ranges for the same resource, in program
> +       order.  */
> +    live_range_rec *prev_range;
> +    live_range_rec *next_range;
> +  };
> +
> +  /* Pass-specific information about an instruction.  */
> +  struct insn_info_rec
> +  {
> +    /* The instruction itself.  */
> +    rtx_insn *insn;
> +
> +    /* A null-terminated list of live ranges for the things that this
> +       instruction defines.  */
> +    live_range_rec **defs;
> +
> +    /* A null-terminated list of live ranges for the things that this
> +       instruction uses.  */
> +    live_range_rec **uses;
> +
> +    /* The point at which the instruction appears.  */
> +    unsigned int point;
> +
> +    /* The cost of the instruction, or UNKNOWN_COST if we haven't
> +       measured it yet.  */
> +    unsigned int cost;
> +  };
> +
> +  /* Describes one attempt to combine instructions.  */
> +  struct combination_attempt_rec
> +  {
> +    /* The instruction that we're currently trying to optimize.
> +       If the combination succeeds, we'll use this insn_info_rec
> +       to describe the new instruction.  */
> +    insn_info_rec *new_home;
> +
> +    /* The instructions we're combining, in program order.  */
> +    insn_info_rec *sequence[MAX_COMBINE_INSNS];
> +
> +    /* If we're substituting SEQUENCE[0] into SEQUENCE[1], this is the
> +       live range that describes the substituted register.  */
> +    live_range_rec *def_use_range;
> +
> +    /* The earliest and latest points at which we could insert the
> +       combined instruction.  */
> +    unsigned int earliest_point;
> +    unsigned int latest_point;
> +
> +    /* The cost of the new instruction, once we have a successful match.  */
> +    unsigned int new_cost;
> +  };
> +
> +  /* Pass-specific information about a register.  */
> +  struct reg_info_rec
> +  {
> +    /* The live range associated with the last reference to the register.  */
> +    live_range_rec *range;
> +
> +    /* The point at which the last reference occurred.  */
> +    unsigned int next_ref;
> +
> +    /* True if the register is currently live.  We record this here rather
> +       than in a separate bitmap because (a) there's a natural hole for
> +       it on LP64 hosts and (b) we only refer to it when updating the
> +       other fields, and so recording it here should give better locality.  
> */
> +    unsigned int live_p : 1;
> +  };
> +
> +  live_range_rec *new_live_range (unsigned int, live_range_rec *);
> +  live_range_rec *reg_live_range (unsigned int);
> +  live_range_rec *mem_live_range ();
> +  bool add_range_use (live_range_rec *, insn_info_rec *);
> +  void remove_range_use (live_range_rec *, insn_info_rec *);
> +  bool has_single_use_p (live_range_rec *);
> +  bool known_last_use_p (live_range_rec *, insn_info_rec *);
> +  unsigned int find_earliest_point (insn_info_rec *, insn_info_rec *);
> +  unsigned int find_latest_point (insn_info_rec *, insn_info_rec *);
> +  bool start_combination (combination_attempt_rec &, insn_info_rec *,
> +                         insn_info_rec *, live_range_rec * = NULL);
> +  bool verify_combination (combination_attempt_rec &);
> +  int estimate_reg_pressure_delta (insn_info_rec *);
> +  void commit_combination (combination_attempt_rec &, bool);
> +  bool try_parallel_sets (combination_attempt_rec &, rtx, rtx);
> +  bool try_parallelize_insns (combination_attempt_rec &);
> +  bool try_combine_def_use_1 (combination_attempt_rec &, rtx, rtx, bool);
> +  bool try_combine_def_use (combination_attempt_rec &, rtx, rtx);
> +  bool try_combine_two_uses (combination_attempt_rec &);
> +  bool try_combine (insn_info_rec *, rtx, unsigned int);
> +  bool optimize_insn (insn_info_rec *);
> +  void record_defs (insn_info_rec *);
> +  void record_reg_use (insn_info_rec *, df_ref);
> +  void record_uses (insn_info_rec *);
> +  void process_insn (insn_info_rec *);
> +  void start_sequence ();
> +
> +  /* The function we're optimizing.  */
> +  function *m_fn;
> +
> +  /* The highest pseudo register number plus one.  */
> +  unsigned int m_num_regs;
> +
> +  /* The current basic block.  */
> +  basic_block m_bb;
> +
> +  /* True if we should optimize the current basic block for speed.  */
> +  bool m_optimize_for_speed_p;
> +
> +  /* The point number to allocate to the next instruction we visit
> +     in the backward traversal.  */
> +  unsigned int m_point;
> +
> +  /* The point number corresponding to the end of the current
> +     instruction sequence, i.e. the lowest point number about which
> +     we still have valid information.  */
> +  unsigned int m_end_of_sequence;
> +
> +  /* The point number corresponding to the end of the current basic block.
> +     This is the same as M_END_OF_SEQUENCE when processing the last
> +     instruction sequence in a basic block.  */
> +  unsigned int m_end_of_bb;
> +
> +  /* The memory live range, or null if we haven't yet found a memory
> +     reference in the current instruction sequence.  */
> +  live_range_rec *m_mem_range;
> +
> +  /* Gives information about each register.  We track both hard and
> +     pseudo registers.  */
> +  auto_vec<reg_info_rec> m_reg_info;
> +
> +  /* A bitmap of registers whose entry in m_reg_info is valid.  */
> +  auto_sbitmap m_valid_regs;
> +
> +  /* If nonnuull, an unused 2-element PARALLEL that we can use to test
> +     instruction combinations.  */
> +  rtx m_spare_parallel;
> +
> +  /* A bitmap of instructions that we've already tried to combine with.  */
> +  auto_bitmap m_tried_insns;
> +
> +  /* A temporary bitmap used to hold register numbers.  */
> +  auto_bitmap m_true_deps;
> +
> +  /* An obstack used for allocating insn_info_recs and for building
> +     up their lists of definitions and uses.  */
> +  obstack m_insn_obstack;
> +
> +  /* An obstack used for allocating live_range_recs.  */
> +  obstack m_range_obstack;
> +
> +  /* Start-of-object pointers for the two obstacks.  */
> +  char *m_insn_obstack_start;
> +  char *m_range_obstack_start;
> +
> +  /* A list of instructions that we've optimized and whose new forms
> +     change the cfg.  */
> +  auto_vec<rtx_insn *> m_cfg_altering_insns;
> +
> +  /* The INSN_UIDs of all instructions in M_CFG_ALTERING_INSNS.  */
> +  auto_bitmap m_cfg_altering_insn_ids;
> +
> +  /* We can insert new instructions at point P * 2 by inserting them
> +     after M_POINTS[P - M_END_OF_SEQUENCE / 2].  We can insert new
> +     instructions at point P * 2 + 1 by inserting them before
> +     M_POINTS[P - M_END_OF_SEQUENCE / 2].  */
> +  auto_vec<rtx_insn *, 256> m_points;
> +};
> +
> +combine2::combine2 (function *fn)
> +  : m_fn (fn),
> +    m_num_regs (max_reg_num ()),
> +    m_bb (NULL),
> +    m_optimize_for_speed_p (false),
> +    m_point (2),
> +    m_end_of_sequence (m_point),
> +    m_end_of_bb (m_point),
> +    m_mem_range (NULL),
> +    m_reg_info (m_num_regs),
> +    m_valid_regs (m_num_regs),
> +    m_spare_parallel (NULL_RTX)
> +{
> +  gcc_obstack_init (&m_insn_obstack);
> +  gcc_obstack_init (&m_range_obstack);
> +  m_reg_info.quick_grow (m_num_regs);
> +  bitmap_clear (m_valid_regs);
> +  m_insn_obstack_start = XOBNEWVAR (&m_insn_obstack, char, 0);
> +  m_range_obstack_start = XOBNEWVAR (&m_range_obstack, char, 0);
> +}
> +
> +combine2::~combine2 ()
> +{
> +  obstack_free (&m_insn_obstack, NULL);
> +  obstack_free (&m_range_obstack, NULL);
> +}
> +
> +/* Return true if it's possible in principle to combine INSN with
> +   other instructions.  ALLOW_ASMS_P is true if the caller can cope
> +   with asm statements.  */
> +
> +static bool
> +combinable_insn_p (rtx_insn *insn, bool allow_asms_p)
> +{
> +  rtx pattern = PATTERN (insn);
> +
> +  if (GET_CODE (pattern) == USE || GET_CODE (pattern) == CLOBBER)
> +    return false;
> +
> +  if (JUMP_P (insn) && find_reg_note (insn, REG_NON_LOCAL_GOTO, NULL_RTX))
> +    return false;
> +
> +  if (!allow_asms_p && asm_noperands (PATTERN (insn)) >= 0)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Return true if it's possible in principle to move INSN somewhere else,
> +   as long as all dependencies are satisfied.  */
> +
> +static bool
> +movable_insn_p (rtx_insn *insn)
> +{
> +  if (JUMP_P (insn))
> +    return false;
> +
> +  if (volatile_refs_p (PATTERN (insn)))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Create and return a new live range for REGNO.  NEXT is the next range
> +   in program order, or null if this is the first live range in the
> +   sequence.  */
> +
> +combine2::live_range_rec *
> +combine2::new_live_range (unsigned int regno, live_range_rec *next)
> +{
> +  live_range_rec *range = XOBNEW (&m_range_obstack, live_range_rec);
> +  memset (range, 0, sizeof (*range));
> +
> +  range->regno = regno;
> +  range->next_range = next;
> +  if (next)
> +    next->prev_range = range;
> +  return range;
> +}
> +
> +/* Return the current live range for register REGNO, creating a new
> +   one if necessary.  */
> +
> +combine2::live_range_rec *
> +combine2::reg_live_range (unsigned int regno)
> +{
> +  /* Initialize the liveness flag, if it isn't already valid for this BB.  */
> +  bool first_ref_p = !bitmap_bit_p (m_valid_regs, regno);
> +  if (first_ref_p || m_reg_info[regno].next_ref < m_end_of_bb)
> +    m_reg_info[regno].live_p = bitmap_bit_p (df_get_live_out (m_bb), regno);
> +
> +  /* See if we already have a live range associated with the current
> +     instruction sequence.  */
> +  live_range_rec *range = NULL;
> +  if (!first_ref_p && m_reg_info[regno].next_ref >= m_end_of_sequence)
> +    range = m_reg_info[regno].range;
> +
> +  /* Create a new range if this is the first reference to REGNO in the
> +     current instruction sequence or if the current range has been closed
> +     off by a definition.  */
> +  if (!range || range->producer)
> +    {
> +      range = new_live_range (regno, range);
> +
> +      /* If the register is live after the current sequence, treat that
> +        as a fake use at the end of the sequence.  */
> +      if (!range->next_range && m_reg_info[regno].live_p)
> +       range->first_extra_use = range->last_extra_use = m_end_of_sequence;
> +
> +      /* Record that this is now the current range for REGNO.  */
> +      if (first_ref_p)
> +       bitmap_set_bit (m_valid_regs, regno);
> +      m_reg_info[regno].range = range;
> +      m_reg_info[regno].next_ref = m_point;
> +    }
> +  return range;
> +}
> +
> +/* Return the current live range for memory, treating memory as a single
> +   entity.  Create a new live range if necessary.  */
> +
> +combine2::live_range_rec *
> +combine2::mem_live_range ()
> +{
> +  if (!m_mem_range || m_mem_range->producer)
> +    m_mem_range = new_live_range (INVALID_REGNUM, m_mem_range);
> +  return m_mem_range;
> +}
> +
> +/* Record that instruction USER uses the resource described by RANGE.
> +   Return true if this is new information.  */
> +
> +bool
> +combine2::add_range_use (live_range_rec *range, insn_info_rec *user)
> +{
> +  /* See if we've already recorded the instruction, or if there's a
> +     spare use slot we can use.  */
> +  unsigned int i = 0;
> +  for (; i < NUM_RANGE_USERS && range->users[i]; ++i)
> +    if (range->users[i] == user)
> +      return false;
> +
> +  if (i == NUM_RANGE_USERS)
> +    {
> +      /* Since we've processed USER recently, assume that it's more
> +        interesting to record explicitly than the last user in the
> +        current list.  Evict that last user and describe it in the
> +        overflow "extra use" range instead.  */
> +      insn_info_rec *ousted_user = range->users[--i];
> +      if (range->first_extra_use < ousted_user->point)
> +       range->first_extra_use = ousted_user->point;
> +      if (range->last_extra_use > ousted_user->point)
> +       range->last_extra_use = ousted_user->point;
> +    }
> +
> +  /* Insert USER while keeping the list sorted.  */
> +  for (; i > 0 && range->users[i - 1]->point < user->point; --i)
> +    range->users[i] = range->users[i - 1];
> +  range->users[i] = user;
> +  return true;
> +}
> +
> +/* Remove USER from the uses recorded for RANGE, if we can.
> +   There's nothing we can do if USER was described in the
> +   overflow "extra use" range.  */
> +
> +void
> +combine2::remove_range_use (live_range_rec *range, insn_info_rec *user)
> +{
> +  for (unsigned int i = 0; i < NUM_RANGE_USERS; ++i)
> +    if (range->users[i] == user)
> +      {
> +       for (unsigned int j = i; j < NUM_RANGE_USERS - 1; ++j)
> +         range->users[j] = range->users[j + 1];
> +       range->users[NUM_RANGE_USERS - 1] = NULL;
> +       break;
> +      }
> +}
> +
> +/* Return true if RANGE has a single known user.  */
> +
> +bool
> +combine2::has_single_use_p (live_range_rec *range)
> +{
> +  return range->users[0] && !range->users[1] && !range->first_extra_use;
> +}
> +
> +/* Return true if we know that USER is the last user of RANGE.  */
> +
> +bool
> +combine2::known_last_use_p (live_range_rec *range, insn_info_rec *user)
> +{
> +  if (range->last_extra_use <= user->point)
> +    return false;
> +
> +  for (unsigned int i = 0; i < NUM_RANGE_USERS && range->users[i]; ++i)
> +    if (range->users[i] == user)
> +      return i == NUM_RANGE_USERS - 1 || !range->users[i + 1];
> +    else if (range->users[i]->point == user->point)
> +      return false;
> +
> +  gcc_unreachable ();
> +}
> +
> +/* Find the earliest point that we could move I2 up in order to combine
> +   it with I1.  Ignore any dependencies between I1 and I2; leave the
> +   caller to deal with those instead.  */
> +
> +unsigned int
> +combine2::find_earliest_point (insn_info_rec *i2, insn_info_rec *i1)
> +{
> +  if (!movable_insn_p (i2->insn))
> +    return i2->point;
> +
> +  /* Start by optimistically assuming that we can move the instruction
> +     all the way up to I1.  */
> +  unsigned int point = i1->point;
> +
> +  /* Make sure that the new position preserves all necessary true 
> dependencies
> +     on earlier instructions.  */
> +  for (live_range_rec **use = i2->uses; *use; ++use)
> +    {
> +      live_range_rec *range = *use;
> +      if (range->producer
> +         && range->producer != i1
> +         && point >= range->producer->point)
> +       point = range->producer->point - 1;
> +    }
> +
> +  /* Make sure that the new position preserves all necessary output and
> +     anti dependencies on earlier instructions.  */
> +  for (live_range_rec **def = i2->defs; *def; ++def)
> +    if (live_range_rec *range = (*def)->prev_range)
> +      {
> +       if (range->producer
> +           && range->producer != i1
> +           && point >= range->producer->point)
> +         point = range->producer->point - 1;
> +
> +       for (unsigned int i = NUM_RANGE_USERS - 1; i-- > 0;)
> +         if (range->users[i] && range->users[i] != i1)
> +           {
> +             if (point >= range->users[i]->point)
> +               point = range->users[i]->point - 1;
> +             break;
> +           }
> +
> +       if (range->last_extra_use && point >= range->last_extra_use)
> +         point = range->last_extra_use - 1;
> +      }
> +
> +  return point;
> +}
> +
> +/* Find the latest point that we could move I1 down in order to combine
> +   it with I2.  Ignore any dependencies between I1 and I2; leave the
> +   caller to deal with those instead.  */
> +
> +unsigned int
> +combine2::find_latest_point (insn_info_rec *i1, insn_info_rec *i2)
> +{
> +  if (!movable_insn_p (i1->insn))
> +    return i1->point;
> +
> +  /* Start by optimistically assuming that we can move the instruction
> +     all the way down to I2.  */
> +  unsigned int point = i2->point;
> +
> +  /* Make sure that the new position preserves all necessary anti 
> dependencies
> +     on later instructions.  */
> +  for (live_range_rec **use = i1->uses; *use; ++use)
> +    if (live_range_rec *range = (*use)->next_range)
> +      if (range->producer != i2 && point <= range->producer->point)
> +       point = range->producer->point + 1;
> +
> +  /* Make sure that the new position preserves all necessary output and
> +     true dependencies on later instructions.  */
> +  for (live_range_rec **def = i1->defs; *def; ++def)
> +    {
> +      live_range_rec *range = *def;
> +
> +      for (unsigned int i = 0; i < NUM_RANGE_USERS; ++i)
> +       if (range->users[i] != i2)
> +         {
> +           if (range->users[i] && point <= range->users[i]->point)
> +             point = range->users[i]->point + 1;
> +           break;
> +         }
> +
> +      if (range->first_extra_use && point <= range->first_extra_use)
> +       point = range->first_extra_use + 1;
> +
> +      live_range_rec *next_range = range->next_range;
> +      if (next_range
> +         && next_range->producer != i2
> +         && point <= next_range->producer->point)
> +       point = next_range->producer->point + 1;
> +    }
> +
> +  return point;
> +}
> +
> +/* Initialize ATTEMPT for an attempt to combine instructions I1 and I2,
> +   where I1 is the instruction that we're currently trying to optimize.
> +   If DEF_USE_RANGE is nonnull, I1 defines the value described by
> +   DEF_USE_RANGE and I2 uses it.  */
> +
> +bool
> +combine2::start_combination (combination_attempt_rec &attempt,
> +                            insn_info_rec *i1, insn_info_rec *i2,
> +                            live_range_rec *def_use_range)
> +{
> +  attempt.new_home = i1;
> +  attempt.sequence[0] = i1;
> +  attempt.sequence[1] = i2;
> +  if (attempt.sequence[0]->point < attempt.sequence[1]->point)
> +    std::swap (attempt.sequence[0], attempt.sequence[1]);
> +  attempt.def_use_range = def_use_range;
> +
> +  /* Check that the instructions have no true dependencies other than
> +     DEF_USE_RANGE.  */
> +  bitmap_clear (m_true_deps);
> +  for (live_range_rec **def = attempt.sequence[0]->defs; *def; ++def)
> +    if (*def != def_use_range)
> +      bitmap_set_bit (m_true_deps, (*def)->regno);
> +  for (live_range_rec **use = attempt.sequence[1]->uses; *use; ++use)
> +    if (*use != def_use_range && bitmap_bit_p (m_true_deps, (*use)->regno))
> +      return false;
> +
> +  /* Calculate the range of points at which the combined instruction
> +     could live.  */
> +  attempt.earliest_point = find_earliest_point (attempt.sequence[1],
> +                                               attempt.sequence[0]);
> +  attempt.latest_point = find_latest_point (attempt.sequence[0],
> +                                           attempt.sequence[1]);
> +  if (attempt.earliest_point < attempt.latest_point)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "cannot combine %d and %d: no suitable"
> +                " location for combined insn\n",
> +                INSN_UID (attempt.sequence[0]->insn),
> +                INSN_UID (attempt.sequence[1]->insn));
> +      return false;
> +    }
> +
> +  /* Make sure we have valid costs for the original instructions before
> +     we start changing their patterns.  */
> +  for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> +    if (attempt.sequence[i]->cost == UNKNOWN_COST)
> +      attempt.sequence[i]->cost = insn_cost (attempt.sequence[i]->insn,
> +                                            m_optimize_for_speed_p);
> +  return true;
> +}
> +
> +/* Check whether the combination attempt described by ATTEMPT matches
> +   an .md instruction (or matches its constraints, in the case of an
> +   asm statement).  If so, calculate the cost of the new instruction
> +   and check whether it's cheap enough.  */
> +
> +bool
> +combine2::verify_combination (combination_attempt_rec &attempt)
> +{
> +  rtx_insn *insn = attempt.sequence[1]->insn;
> +
> +  bool ok_p = verify_changes (0);
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      if (!ok_p)
> +       fprintf (dump_file, "failed to match this instruction:\n");
> +      else if (const char *name = get_insn_name (INSN_CODE (insn)))
> +       fprintf (dump_file, "successfully matched this instruction to %s:\n",
> +                name);
> +      else
> +       fprintf (dump_file, "successfully matched this instruction:\n");
> +      print_rtl_single (dump_file, PATTERN (insn));
> +    }
> +  if (!ok_p)
> +    return false;
> +
> +  unsigned int cost1 = attempt.sequence[0]->cost;
> +  unsigned int cost2 = attempt.sequence[1]->cost;
> +  attempt.new_cost = insn_cost (insn, m_optimize_for_speed_p);
> +  ok_p = (attempt.new_cost <= cost1 + cost2);
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "original cost = %d + %d, replacement cost = %d; 
> %s\n",
> +            cost1, cost2, attempt.new_cost,
> +            ok_p ? "keeping replacement" : "rejecting replacement");
> +  if (!ok_p)
> +    return false;
> +
> +  confirm_change_group ();
> +  return true;
> +}
> +
> +/* Return true if we should consider register REGNO when calculating
> +   register pressure estimates.  */
> +
> +static bool
> +count_reg_pressure_p (unsigned int regno)
> +{
> +  if (regno == INVALID_REGNUM)
> +    return false;
> +
> +  /* Unallocatable registers aren't interesting.  */
> +  if (HARD_REGISTER_NUM_P (regno) && fixed_regs[regno])
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Try to estimate the effect that the original form of INSN_INFO
> +   had on register pressure, in the form "born - dying".  */
> +
> +int
> +combine2::estimate_reg_pressure_delta (insn_info_rec *insn_info)
> +{
> +  int delta = 0;
> +
> +  for (live_range_rec **def = insn_info->defs; *def; ++def)
> +    if (count_reg_pressure_p ((*def)->regno))
> +      delta += 1;
> +
> +  for (live_range_rec **use = insn_info->uses; *use; ++use)
> +    if (count_reg_pressure_p ((*use)->regno)
> +       && known_last_use_p (*use, insn_info))
> +      delta -= 1;
> +
> +  return delta;
> +}
> +
> +/* We've moved FROM_INSN's pattern to TO_INSN and are about to delete
> +   FROM_INSN.  Copy any useful information to TO_INSN before doing that.  */
> +
> +static void
> +transfer_insn (rtx_insn *to_insn, rtx_insn *from_insn)
> +{
> +  INSN_LOCATION (to_insn) = INSN_LOCATION (from_insn);
> +  INSN_CODE (to_insn) = INSN_CODE (from_insn);
> +  REG_NOTES (to_insn) = REG_NOTES (from_insn);
> +}
> +
> +/* The combination attempt in ATTEMPT has succeeded and is currently
> +   part of an open validate_change group.  Commit to making the change
> +   and decide where the new instruction should go.
> +
> +   KEPT_DEF_P is true if the new instruction continues to perform
> +   the definition described by ATTEMPT.def_use_range.  */
> +
> +void
> +combine2::commit_combination (combination_attempt_rec &attempt,
> +                             bool kept_def_p)
> +{
> +  insn_info_rec *new_home = attempt.new_home;
> +  rtx_insn *old_insn = attempt.sequence[0]->insn;
> +  rtx_insn *new_insn = attempt.sequence[1]->insn;
> +
> +  /* Remove any notes that are no longer relevant.  */
> +  bool single_set_p = single_set (new_insn);
> +  for (rtx *note_ptr = &REG_NOTES (new_insn); *note_ptr; )
> +    {
> +      rtx note = *note_ptr;
> +      bool keep_p = true;
> +      switch (REG_NOTE_KIND (note))
> +       {
> +       case REG_EQUAL:
> +       case REG_EQUIV:
> +       case REG_NOALIAS:
> +         keep_p = single_set_p;
> +         break;
> +
> +       case REG_UNUSED:
> +         keep_p = false;
> +         break;
> +
> +       default:
> +         break;
> +       }
> +      if (keep_p)
> +       note_ptr = &XEXP (*note_ptr, 1);
> +      else
> +       {
> +         *note_ptr = XEXP (*note_ptr, 1);
> +         free_EXPR_LIST_node (note);
> +       }
> +    }
> +
> +  /* Complete the open validate_change group.  */
> +  confirm_change_group ();
> +
> +  /* Decide where the new instruction should go.  */
> +  unsigned int new_point = attempt.latest_point;
> +  if (new_point != attempt.earliest_point
> +      && prev_real_insn (new_insn) != old_insn)
> +    {
> +      /* Prefer the earlier point if the combined instruction reduces
> +        register pressure and the latest point if it increases register
> +        pressure.
> +
> +        The choice isn't obvious in the event of a tie, but picking
> +        the earliest point should reduce the number of times that
> +        we need to invalidate debug insns.  */
> +      int delta1 = estimate_reg_pressure_delta (attempt.sequence[0]);
> +      int delta2 = estimate_reg_pressure_delta (attempt.sequence[1]);
> +      bool move_up_p = (delta1 + delta2 <= 0);
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file,
> +                "register pressure delta = %d + %d; using %s position\n",
> +                delta1, delta2, move_up_p ? "earliest" : "latest");
> +      if (move_up_p)
> +       new_point = attempt.earliest_point;
> +    }
> +
> +  /* Translate inserting at NEW_POINT into inserting before or after
> +     a particular insn.  */
> +  rtx_insn *anchor = NULL;
> +  bool before_p = (new_point & 1);
> +  if (new_point != attempt.sequence[1]->point
> +      && new_point != attempt.sequence[0]->point)
> +    {
> +      anchor = m_points[(new_point - m_end_of_sequence) / 2];
> +      rtx_insn *other_side = (before_p
> +                             ? prev_real_insn (anchor)
> +                             : next_real_insn (anchor));
> +      /* Inserting next to an insn X and then deleting X is just a
> +        roundabout way of using X as the insertion point.  */
> +      if (anchor == new_insn || other_side == new_insn)
> +       new_point = attempt.sequence[1]->point;
> +      else if (anchor == old_insn || other_side == old_insn)
> +       new_point = attempt.sequence[0]->point;
> +    }
> +
> +  /* Actually perform the move.  */
> +  if (new_point == attempt.sequence[1]->point)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "using insn %d to hold the combined pattern\n",
> +                INSN_UID (new_insn));
> +      set_insn_deleted (old_insn);
> +    }
> +  else if (new_point == attempt.sequence[0]->point)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "using insn %d to hold the combined pattern\n",
> +                INSN_UID (old_insn));
> +      PATTERN (old_insn) = PATTERN (new_insn);
> +      transfer_insn (old_insn, new_insn);
> +      std::swap (old_insn, new_insn);
> +      set_insn_deleted (old_insn);
> +    }
> +  else
> +    {
> +      /* We need to insert a new instruction.  We can't simply move
> +        NEW_INSN because it acts as an insertion anchor in m_points.  */
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "inserting combined insn %s insn %d\n",
> +                before_p ? "before" : "after", INSN_UID (anchor));
> +
> +      rtx_insn *added_insn = (before_p
> +                             ? emit_insn_before (PATTERN (new_insn), anchor)
> +                             : emit_insn_after (PATTERN (new_insn), anchor));
> +      transfer_insn (added_insn, new_insn);
> +      set_insn_deleted (old_insn);
> +      set_insn_deleted (new_insn);
> +      new_insn = added_insn;
> +    }
> +  df_insn_rescan (new_insn);
> +
> +  /* Unlink the old uses.  */
> +  for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> +    for (live_range_rec **use = attempt.sequence[i]->uses; *use; ++use)
> +      remove_range_use (*use, attempt.sequence[i]);
> +
> +  /* Work out which registers the new pattern uses.  */
> +  bitmap_clear (m_true_deps);
> +  df_ref use;
> +  FOR_EACH_INSN_USE (use, new_insn)
> +    {
> +      rtx reg = DF_REF_REAL_REG (use);
> +      bitmap_set_range (m_true_deps, REGNO (reg), REG_NREGS (reg));
> +    }
> +  FOR_EACH_INSN_EQ_USE (use, new_insn)
> +    {
> +      rtx reg = DF_REF_REAL_REG (use);
> +      bitmap_set_range (m_true_deps, REGNO (reg), REG_NREGS (reg));
> +    }
> +
> +  /* Describe the combined instruction in NEW_HOME.  */
> +  new_home->insn = new_insn;
> +  new_home->point = new_point;
> +  new_home->cost = attempt.new_cost;
> +
> +  /* Build up a list of definitions for the combined instructions
> +     and update all the ranges accordingly.  It shouldn't matter
> +     which order we do this in.  */
> +  for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> +    for (live_range_rec **def = attempt.sequence[i]->defs; *def; ++def)
> +      if (kept_def_p || *def != attempt.def_use_range)
> +       {
> +         obstack_ptr_grow (&m_insn_obstack, *def);
> +         (*def)->producer = new_home;
> +       }
> +  obstack_ptr_grow (&m_insn_obstack, NULL);
> +  new_home->defs = (live_range_rec **) obstack_finish (&m_insn_obstack);
> +
> +  /* Build up a list of uses for the combined instructions and update
> +     all the ranges accordingly.  Again, it shouldn't matter which
> +     order we do this in.  */
> +  for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> +    for (live_range_rec **use = attempt.sequence[i]->uses; *use; ++use)
> +      if (*use != attempt.def_use_range
> +         && add_range_use (*use, new_home))
> +       obstack_ptr_grow (&m_insn_obstack, *use);
> +  obstack_ptr_grow (&m_insn_obstack, NULL);
> +  new_home->uses = (live_range_rec **) obstack_finish (&m_insn_obstack);
> +
> +  /* There shouldn't be any remaining references to other instructions
> +     in the combination.  Invalidate their contents to make lingering
> +     references a noisy failure.  */
> +  for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> +    if (attempt.sequence[i] != new_home)
> +      {
> +       attempt.sequence[i]->insn = NULL;
> +       attempt.sequence[i]->point = ~0U;
> +      }
> +
> +  /* Unlink the def-use range.  */
> +  if (!kept_def_p && attempt.def_use_range)
> +    {
> +      live_range_rec *range = attempt.def_use_range;
> +      if (range->prev_range)
> +       range->prev_range->next_range = range->next_range;
> +      else
> +       m_reg_info[range->regno].range = range->next_range;
> +      if (range->next_range)
> +       range->next_range->prev_range = range->prev_range;
> +    }
> +
> +  /* Record instructions whose new form alters the cfg.  */
> +  rtx pattern = PATTERN (new_insn);
> +  if ((returnjump_p (new_insn)
> +       || any_uncondjump_p (new_insn)
> +       || (GET_CODE (pattern) == TRAP_IF && XEXP (pattern, 0) == const1_rtx))
> +      && bitmap_set_bit (m_cfg_altering_insn_ids, INSN_UID (new_insn)))
> +    m_cfg_altering_insns.safe_push (new_insn);
> +}
> +
> +/* Return true if X1 and X2 are memories and if X1 does not have
> +   a higher alignment than X2.  */
> +
> +static bool
> +dubious_mem_pair_p (rtx x1, rtx x2)
> +{
> +  return MEM_P (x1) && MEM_P (x2) && MEM_ALIGN (x1) <= MEM_ALIGN (x2);
> +}
> +
> +/* Try implement ATTEMPT using (parallel [SET1 SET2]).  */
> +
> +bool
> +combine2::try_parallel_sets (combination_attempt_rec &attempt,
> +                            rtx set1, rtx set2)
> +{
> +  rtx_insn *insn = attempt.sequence[1]->insn;
> +
> +  /* Combining two loads or two stores can be useful on targets that
> +     allow them to be treated as a single access.  However, we use a
> +     very peephole approach to picking the pairs, so we need to be
> +     relatively confident that we're making a good choice.
> +
> +     For now just aim for cases in which the memory references are
> +     consecutive and the first reference has a higher alignment.
> +     We can leave the target to test the consecutive part; whatever test
> +     we added here might be different from the target's, and in any case
> +     it's fine if the target accepts other well-aligned cases too.  */
> +  if (dubious_mem_pair_p (SET_DEST (set1), SET_DEST (set2))
> +      || dubious_mem_pair_p (SET_SRC (set1), SET_SRC (set2)))
> +    return false;
> +
> +  /* Cache the PARALLEL rtx between attempts so that we don't generate
> +     too much garbage rtl.  */
> +  if (!m_spare_parallel)
> +    {
> +      rtvec vec = gen_rtvec (2, set1, set2);
> +      m_spare_parallel = gen_rtx_PARALLEL (VOIDmode, vec);
> +    }
> +  else
> +    {
> +      XVECEXP (m_spare_parallel, 0, 0) = set1;
> +      XVECEXP (m_spare_parallel, 0, 1) = set2;
> +    }
> +
> +  unsigned int num_changes = num_validated_changes ();
> +  validate_change (insn, &PATTERN (insn), m_spare_parallel, true);
> +  if (verify_combination (attempt))
> +    {
> +      m_spare_parallel = NULL_RTX;
> +      return true;
> +    }
> +  cancel_changes (num_changes);
> +  return false;
> +}
> +
> +/* Try to parallelize the two instructions in ATTEMPT.  */
> +
> +bool
> +combine2::try_parallelize_insns (combination_attempt_rec &attempt)
> +{
> +  rtx_insn *i1_insn = attempt.sequence[0]->insn;
> +  rtx_insn *i2_insn = attempt.sequence[1]->insn;
> +
> +  /* Can't parallelize asm statements.  */
> +  if (asm_noperands (PATTERN (i1_insn)) >= 0
> +      || asm_noperands (PATTERN (i2_insn)) >= 0)
> +    return false;
> +
> +  /* For now, just handle the case in which both instructions are
> +     single sets.  We could handle more than 2 sets as well, but few
> +     targets support that anyway.  */
> +  rtx set1 = single_set (i1_insn);
> +  if (!set1)
> +    return false;
> +  rtx set2 = single_set (i2_insn);
> +  if (!set2)
> +    return false;
> +
> +  /* Make sure that we have structural proof that the destinations
> +     are independent.  Things like alias analysis rely on semantic
> +     information and assume no undefined behavior, which is rarely a
> +     good enough guarantee to allow a useful instruction combination.  */
> +  rtx dest1 = SET_DEST (set1);
> +  rtx dest2 = SET_DEST (set2);
> +  if (MEM_P (dest1)
> +      ? MEM_P (dest2) && nonoverlapping_memrefs_p (dest1, dest2, false)
> +      : !MEM_P (dest2) && reg_overlap_mentioned_p (dest1, dest2))
> +    return false;
> +
> +  /* Try the sets in both orders.  */
> +  if (try_parallel_sets (attempt, set1, set2)
> +      || try_parallel_sets (attempt, set2, set1))
> +    {
> +      commit_combination (attempt, true);
> +      if (MAY_HAVE_DEBUG_BIND_INSNS
> +         && attempt.new_home->insn != i1_insn)
> +       propagate_for_debug (i1_insn, attempt.new_home->insn,
> +

Re: Add a new combine pass

Reply via email to