On Wed, May 05, 2021 at 06:52:27PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Wed, May 05, 2021 at 01:45:29PM +0200, Marc Glisse wrote:
> > On Tue, 4 May 2021, Jakub Jelinek via Gcc-patches wrote:
> > 
> > > 2) the pr94589-2.C testcase should be matching just 12 times each, but 
> > > runs
> > > into operator>=(strong_ordering, unspecified) being defined as
> > > (_M_value&1)==_M_value
> > > rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
> > > unreachable and so (_M_value&1)==_M_value is then equivalent to 
> > > _M_value>=0,
> > > but is not a single use but two uses.  I'll need to pattern match that 
> > > case
> > > specially.
> > 
> > Somewhere in RTL (_M_value&1)==_M_value is turned into (_M_value&-2)==0,
> > that could be worth doing already in GIMPLE.
> 
> Apparently it is
>   /* Simplify eq/ne (and/ior x y) x/y) for targets with a BICS instruction or
>      constant folding if x/y is a constant.  */
>   if ((code == EQ || code == NE)
>       && (op0code == AND || op0code == IOR)
>       && !side_effects_p (op1)
>       && op1 != CONST0_RTX (cmp_mode))
>     {
>       /* Both (eq/ne (and x y) x) and (eq/ne (ior x y) y) simplify to
>          (eq/ne (and (not y) x) 0).  */
> ...
>       /* Both (eq/ne (and x y) y) and (eq/ne (ior x y) x) simplify to
>          (eq/ne (and (not x) y) 0).  */
> Yes, doing that on GIMPLE for the case where the not argument is constant
> would simplify the phiopt follow-up (it would be single imm use then).

Though, (x&1) == x is equivalent to both (x&~1)==0 and to x < 2U
and from the latter two it isn't obvious which one is better/more canonical.
On aarch64 I don't see differences in number of insns nor in their size:
  10:   13001c00        sxtb    w0, w0
  14:   721f781f        tst     w0, #0xfffffffe
  18:   1a9f17e0        cset    w0, eq  // eq = none
  1c:   d65f03c0        ret
vs.
  20:   12001c00        and     w0, w0, #0xff
  24:   7100041f        cmp     w0, #0x1
  28:   1a9f87e0        cset    w0, ls  // ls = plast
  2c:   d65f03c0        ret
On x86_64 same number of insns, but the comparison is shorter (note, the
spaceship result is a struct with signed char based enum):
  10:   31 c0                   xor    %eax,%eax
  12:   81 e7 fe 00 00 00       and    $0xfe,%edi
  18:   0f 94 c0                sete   %al
  1b:   c3                      retq   
  1c:   0f 1f 40 00             nopl   0x0(%rax)
vs.
  20:   31 c0                   xor    %eax,%eax
  22:   40 80 ff 01             cmp    $0x1,%dil
  26:   0f 96 c0                setbe  %al
  29:   c3                      retq   
Generally, I'd think that the comparison should be better because it
will be most common in user code that way and VRP will be able to do the
best thing for it.

Before we decide how to optimize it, can't we change libstdc++
to use the more efficient implementation?

I.e. either following, or { return (__v._M_value & ~1) == 0; } ?

2021-05-06  Jakub Jelinek  <ja...@redhat.com>

        * compare (operator>=(partial_ordering, __cmp_cat::__unspec),
        operator<=(__cmp_cat::__unspec, partial_ordering)): Simplify to
        (unsigned char) __v._M_value < 2.

--- gcc/compare.jj      2021-03-10 17:36:41.288491012 +0100
+++ gcc/compare 2021-05-06 11:44:37.519553192 +0200
@@ -107,7 +107,7 @@ namespace std
 
     friend constexpr bool
     operator>=(partial_ordering __v, __cmp_cat::__unspec) noexcept
-    { return __cmp_cat::type(__v._M_value & 1) == __v._M_value; }
+    { return static_cast<unsigned char>(__v._M_value) < 2; }
 
     friend constexpr bool
     operator< (__cmp_cat::__unspec, partial_ordering __v) noexcept
@@ -119,7 +119,7 @@ namespace std
 
     friend constexpr bool
     operator<=(__cmp_cat::__unspec, partial_ordering __v) noexcept
-    { return __cmp_cat::type(__v._M_value & 1) == __v._M_value; }
+    { return static_cast<unsigned char>(__v._M_value) < 2; }
 
     friend constexpr bool
     operator>=(__cmp_cat::__unspec, partial_ordering __v) noexcept


        Jakub

Reply via email to