https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122420

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64
         Resolution|---                         |WONTFIX
             Status|UNCONFIRMED                 |RESOLVED
          Component|tree-optimization           |rtl-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Actually this is more complex.

Because for aarch64 doing the opposite is better.
f:
        sub     w1, w0, #1
        bic     w0, w1, w0
        clz     w0, w0
        ret
g:
        rbit    w0, w0
        mov     w1, 32
        clz     w0, w0
        sub     w0, w1, w0
        ret

Because aarch64 does not have a ctz; only clz. 


So I think this should be done at the RTL level instead:
```
Trying 9, 10 -> 11:
    9: r106:SI=r109:SI-0x1
   10: {r107:SI=~r109:SI&r106:SI;clobber flags:CC;}
      REG_DEAD r109:SI
      REG_DEAD r106:SI
      REG_UNUSED flags:CC
   11: {r102:SI=clz(r107:SI);clobber flags:CC;}
      REG_DEAD r107:SI
      REG_UNUSED flags:CC
Failed to match this instruction:
(parallel [
        (set (reg:SI 102 [ <retval> ])
            (clz:SI (and:SI (plus:SI (reg:SI 109 [ x ])
                        (const_int -1 [0xffffffffffffffff]))
                    (not:SI (reg:SI 109 [ x ])))))
        (clobber (reg:CC 17 flags))
```
This can be simplified to:
(set (reg:SI 102 [ <retval> ])
     (minus:SI (const_int 32) (ctz:SI (reg:SI 109))))
(clobber (reg:CC 17 flags)

But that also requires 3 instruction.  So maybe this is not better after all.

So closing as won't fix.

Compiling with `-O2 -march=skylake`

Reply via email to