On Fri, Sep 16, 2016 at 10:20 AM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > > On 16/09/16 10:02, Richard Biener wrote: >> >> On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov >> <kyrylo.tkac...@foss.arm.com> wrote: >>> >>> Hi all, >>> >>> Currently the functions: >>> int f1(int x, int t) >>> { >>> if (x == -1 || x == -2) >>> t = 1; >>> return t; >>> } >>> >>> int f2(int x, int t) >>> { >>> if (x == -1 || x == -2) >>> return 1; >>> return t; >>> } >>> >>> generate different code on AArch64 even though they have identical >>> functionality: >>> f1: >>> add w0, w0, 2 >>> cmp w0, 1 >>> csinc w0, w1, wzr, hi >>> ret >>> >>> f2: >>> cmn w0, #2 >>> csinc w0, w1, wzr, cc >>> ret >>> >>> The problem is that f2 performs the comparison (LTU w0 -2) >>> whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to >>> simplify >>> the f1 form >>> to the f2 form with the simplify-rtx.c rule added in this patch. With >>> this >>> patch the >>> codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC). >>> >>> Bootstrapped and tested on arm-none-linux-gnueabihf, >>> aarch64-none-linux-gnu, >>> x86_64. >>> What do you think? Is this a correct generalisation of this issue? >>> If so, ok for trunk? >> >> Do you see a difference on the GIMPLE level? If so, this kind of >> transform looks >> appropriate there, too. > > > The GIMPLE for the two functions looks almost identical: > f1 (intD.7 xD.3078, intD.7 tD.3079) > { > intD.7 x_4(D) = xD.3078; > intD.7 t_5(D) = tD.3079; > unsigned int x.0_1; > unsigned int _2; > x.0_1 = (unsigned int) x_4(D); > > _2 = x.0_1 + 2; > if (_2 <= 1) > goto <bb 3>; > else > goto <bb 4>; > ;; basic block 3, loop depth 0, count 0, freq 3977, maybe hot > ;; basic block 4, loop depth 0, count 0, freq 10000, maybe hot > > # t_3 = PHI <t_5(D)(2), 1(3)> > return t_3; > } > > f2 (intD.7 xD.3082, intD.7 tD.3083) > { > intD.7 x_4(D) = xD.3082; > intD.7 t_5(D) = tD.3083; > unsigned int x.1_1; > unsigned int _2; > intD.7 _3; > > x.1_1 = (unsigned int) x_4(D); > > _2 = x.1_1 + 2; > if (_2 <= 1) > goto <bb 4>; > else > goto <bb 3>; > > ;; basic block 3, loop depth 0, count 0, freq 6761, maybe hot > ;; basic block 4, loop depth 0, count 0, freq 10000, maybe hot > # _3 = PHI <1(2), t_5(D)(3)> > return _3; > > } > > So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with slightly > different CFG. RTL-level transformations (ce1) bring it to the pre-combine > RTL > where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1). > > So the differences start at RTL level, so I think we need this > transformation there. > However, for the testcase: > unsigned int > foo (unsigned int a, unsigned int b) > { > return (a + 2) > 1; > } > > The differences do appear at GIMPLE level, so I think a match.pd pattern > would help here. Hi, may I ask what the function looks like to which this one is different to?
Thanks, bin