> -----Original Message----- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jeff Law > Sent: Tuesday, December 02, 2014 6:11 AM > To: Zhenqiang Chen > Cc: Steven Bosscher; gcc-patches@gcc.gnu.org; Jakub Jelinek > Subject: Re: [PATCH] Fix PR 61225 > > On 08/04/14 02:24, Zhenqiang Chen wrote: > > >>> > >>> ChangeLog: > >>> 2014-05-22 Zhenqiang Chen <zhenqiang.c...@linaro.org> > >>> > >>> Part of PR rtl-optimization/61225 > >>> * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): > >>> New proto. > >>> * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function. > >>> * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note > when > >>> propagating to SET. > >> > >> I can't help but wonder why the new 4 insn combination code isn't > >> presenting this as a nice big fat insn to the x86 backend which would > >> eliminate the need for the peep2. > >> > >> But, assuming there's a fundamental reason why that's not kicking in... > > > > Current combine pass can only handle > > > > I0 -> I1 -> I2 -> I3. > > I0, I1 -> I2, I2 -> I3. > > I0 -> I2; I1, I2 -> I3. > > I0 -> I1; I1, I2 -> I3. > > > > For the case, it is > > I1 -> I2 -> I3; I2 -> INSN > > > > I3 and INSN looks like not related. But INSN is a COMPARE to set CC > > and I3 can also set CC. I3 and INSN can be combined together as one > > instruction to set CC. > Presumably there's no dataflow between I3 and INSN because they both set > CC (doesn't that make them anti-dependent? > > Can you show me the RTL corresponding to I1, I2, I3 and INSN, I simply find it > easier to look at RTL rather than guess why we don't have the appropriate > linkage and thus not attempting the combinations we want. > > Pseudo code for the resulting I3 and INSN would help -- as I work through > this there's some inconsistencies in how I'm interpreting a few things and RTL > and pseudo-rtl for the desired output RTL would help a lot.
C code: if (!--*p) rtl code: 6: r91:SI=[r90:SI] 7: {r88:SI=r91:SI-0x1;clobber flags:CC;} 8: [r90:SI]=r88:SI 9: flags:CCZ=cmp(r88:SI,0) expected output: 8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;} in assemble, it is decl (%eax) > > > > ChangeLog > > 2014-08-04 Zhenqiang Chen <zhenqiang.c...@linaro.org> > > > > Part of PR rtl-optimization/61225 > > * combine.c (refer_same_reg_p): New function. > > (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn. > > (try_combine): Add one more parameter TO_COMBINED_INSN, which > is > > used to create a new insn parallel (TO_COMBINED_INSN, I3). > > > > testsuite/ChangeLog: > > 2014-08-04 Zhenqiang Chen <zhenqiang.c...@linaro.org> > > > > * gcc.target/i386/pr61225.c: New test. > > > > diff --git a/gcc/combine.c b/gcc/combine.c index 53ac1d6..42098ab > > 100644 > > --- a/gcc/combine.c > > +++ b/gcc/combine.c > > @@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx); > > static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *); > > static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx *); > > static int contains_muldiv (rtx); > > -static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx); > > +static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx); > > static void undo_all (void); > > static void undo_commit (void); > > static rtx *find_split_point (rtx *, rtx, bool); @@ -1099,6 +1099,46 > > @@ insn_a_feeds_b (rtx a, rtx b) > > #endif > > return false; > > } > > + > > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. > > + It returns TRUE, if reg1 == reg2, and no other refer of reg1 > > + except A and B. */ > > + > > +static bool > > +refer_same_reg_p (rtx a, rtx b) > > +{ > > + rtx seta = single_set (a); > > + rtx setb = single_set (b); > > + > > + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) > > + || !seta || !setb) > > + return false; > Go ahead and use > || !setb > || !setb > > It's a bit more vertical space, but I believe closer in line with our coding > standards. Updated. > > + > > + if (GET_CODE (SET_SRC (seta)) != COMPARE > > + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC > > + || !REG_P (XEXP (SET_SRC (seta), 0)) > > + || !const0_rtx > > + || !REG_P (SET_SRC (setb)) > > + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) > > + return false; > What's the !const0_rtx test here? Don't you want to test some object > from SETA against const0_rtx? Also note that you may need to test > against CONST0_RTX (mode) It's my fault. It should be XEXP (SET_SRC (seta), 1) != const0_rtx The updated patch is attached. Thanks! -Zhenqiang > > @@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs) > > } > > } > > > > + /* Try to combine a compare insn that sets CC > > + with a preceding insn that can set CC, and maybe with its > > + logical predecessor as well. > > + We need this special code because data flow connections > > + do not get entered in LOG_LINKS. */ > So you'd want to be more specific about what dataflow connections are > not in the LOG_LINKS that we want. > > It feels to me like we're missing the anti-dependence links on CC and > that there's a general aspect to combine missing here. But I want to > hold off on final judgement until I know more. > > I also wonder if compare-elim ought to be helping here. Isn't that the > point here, to eliminate the comparison and instead get it for free as > part of the arithmetic? If so, is it the fact that we have memory > references that prevents compare-elim from kicking in? > > jeff >
pr61225.patch
Description: Binary data