On 16/09/16 10:50, Bin.Cheng wrote:
On Fri, Sep 16, 2016 at 10:20 AM, Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
On 16/09/16 10:02, Richard Biener wrote:
On Fri, Sep 16, 2016 at 10:40 AM, Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
Hi all,

Currently the functions:
int f1(int x, int t)
{
    if (x == -1 || x == -2)
      t = 1;
    return t;
}

int f2(int x, int t)
{
    if (x == -1 || x == -2)
      return 1;
    return t;
}

generate different code on AArch64 even though they have identical
functionality:
f1:
          add     w0, w0, 2
          cmp     w0, 1
          csinc   w0, w1, wzr, hi
          ret

f2:
          cmn     w0, #2
          csinc   w0, w1, wzr, cc
          ret

The problem is that f2 performs the comparison (LTU w0 -2)
whereas f1 performs (GTU (PLUS w0 2) 1). I think it is possible to
simplify
the f1 form
to the f2 form with the simplify-rtx.c rule added in this patch. With
this
patch the
codegen for both f1 and f2 on aarch64 at -O2 is identical (CMN, CSINC).

Bootstrapped and tested on arm-none-linux-gnueabihf,
aarch64-none-linux-gnu,
x86_64.
What do you think? Is this a correct generalisation of this issue?
If so, ok for trunk?
Do you see a difference on the GIMPLE level?  If so, this kind of
transform looks
appropriate there, too.

The GIMPLE for the two functions looks almost identical:
f1 (intD.7 xD.3078, intD.7 tD.3079)
{
   intD.7 x_4(D) = xD.3078;
   intD.7 t_5(D) = tD.3079;
   unsigned int x.0_1;
   unsigned int _2;
   x.0_1 = (unsigned int) x_4(D);

   _2 = x.0_1 + 2;
   if (_2 <= 1)
     goto <bb 3>;
   else
     goto <bb 4>;
;;   basic block 3, loop depth 0, count 0, freq 3977, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 10000, maybe hot

   # t_3 = PHI <t_5(D)(2), 1(3)>
   return t_3;
}

f2 (intD.7 xD.3082, intD.7 tD.3083)
{
   intD.7 x_4(D) = xD.3082;
   intD.7 t_5(D) = tD.3083;
   unsigned int x.1_1;
   unsigned int _2;
   intD.7 _3;

   x.1_1 = (unsigned int) x_4(D);

   _2 = x.1_1 + 2;
   if (_2 <= 1)
     goto <bb 4>;
   else
     goto <bb 3>;

;;   basic block 3, loop depth 0, count 0, freq 6761, maybe hot
;;   basic block 4, loop depth 0, count 0, freq 10000, maybe hot
   # _3 = PHI <1(2), t_5(D)(3)>
   return _3;

}

So at GIMPLE level we see a (x + 2 <=u 1) in both cases but with slightly
different CFG.  RTL-level transformations (ce1) bring it to the pre-combine
RTL
where one does (LTU w0 -2) and the other does (GTU (PLUS w0 2) 1).

So the differences start at RTL level, so I think we need this
transformation there.
However, for the testcase:
unsigned int
foo (unsigned int a, unsigned int b)
{
   return (a + 2) > 1;
}

The differences do appear at GIMPLE level, so I think a match.pd pattern
would help here.
Hi, may I ask what the function looks like to which this one is different to?

Hi Bin,
I meant to say that the unsigned greater than comparison is retained at the 
GIMPLE level
so could be optimised there.

Kyrill

Thanks,
bin

Reply via email to