https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108874
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #7)
> Can we recognize it as bswap32 + roatate 16 in match.pd when backend
> supports boths, and then it should be easy for aarch64/arm to tranform bswap
> + ratate into rev16 at rtl level.
That definitely would be better in the general case (note it might not be in
match.pd though) though doing:
(set (reg:SI 98)
(ior:SI (and:SI (lshiftrt:SI (reg/v:SI 97 [ x ])
(const_int 8 [0x8]))
(const_int 16711935 [0xff00ff]))
(reg:SI 102)))
as
(set (reg:SI 98)
(ior:SI
(lshiftrt:SI
(and:SI (reg/v:SI 97 [ x ]) (const_int 0xff00ff00) )
(const_int 8 [0x8]))
(reg:SI 102)))
in the aarch64 backend would produce better code for some other examples too
and not just rev16 generation really.
Take:
```
unsigned f(unsigned x, unsigned b)
{
return ((x & 0xff00ff00U) >> 8) | b;
}
```
GCC 5 used to produce:
and w0, w0, -16711936
orr w0, w1, w0, lsr 8
ret
While the tunk does:
lsr w0, w0, 8
and w0, w0, 16711935
orr w0, w0, w1
ret
Note xor and addition should be handled in a similar way too.
That is these has a similar regression:
unsigned f(unsigned x, unsigned b)
{
return ((x & 0xff00ff00U) >> 8) ^ b;
}
unsigned f1(unsigned x, unsigned b)
{
return ((x & 0xff00ff00U) >> 8) + b;
}