https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122420
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64
Resolution|--- |WONTFIX
Status|UNCONFIRMED |RESOLVED
Component|tree-optimization |rtl-optimization
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Actually this is more complex.
Because for aarch64 doing the opposite is better.
f:
sub w1, w0, #1
bic w0, w1, w0
clz w0, w0
ret
g:
rbit w0, w0
mov w1, 32
clz w0, w0
sub w0, w1, w0
ret
Because aarch64 does not have a ctz; only clz.
So I think this should be done at the RTL level instead:
```
Trying 9, 10 -> 11:
9: r106:SI=r109:SI-0x1
10: {r107:SI=~r109:SI&r106:SI;clobber flags:CC;}
REG_DEAD r109:SI
REG_DEAD r106:SI
REG_UNUSED flags:CC
11: {r102:SI=clz(r107:SI);clobber flags:CC;}
REG_DEAD r107:SI
REG_UNUSED flags:CC
Failed to match this instruction:
(parallel [
(set (reg:SI 102 [ <retval> ])
(clz:SI (and:SI (plus:SI (reg:SI 109 [ x ])
(const_int -1 [0xffffffffffffffff]))
(not:SI (reg:SI 109 [ x ])))))
(clobber (reg:CC 17 flags))
```
This can be simplified to:
(set (reg:SI 102 [ <retval> ])
(minus:SI (const_int 32) (ctz:SI (reg:SI 109))))
(clobber (reg:CC 17 flags)
But that also requires 3 instruction. So maybe this is not better after all.
So closing as won't fix.
Compiling with `-O2 -march=skylake`