https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61559
Uroš Bizjak <ubizjak at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2014-09-03 CC| |rguenth at gcc dot gnu.org Component|rtl-optimization |tree-optimization Ever confirmed|0 |1 --- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Eric Botcazou from comment #4) > I guess the transformations should accept MEMs instead of just REGs but, no, > I'm not particularly interested in quirks of CISC architectures, I have > enough to do with those of RISC architectures. The problem is that with both function arguments in memory, combine simplifies sequence of bswaps with memory argument ( == movbe) in foo7 to: Failed to match this instruction: (set (reg:SI 84 [ D.2318 ]) (xor:SI (mem/c:SI (plus:SI (reg/f:SI 16 argp) (const_int 4 [0x4])) [2 b+0 S4 A32]) (mem/c:SI (reg/f:SI 16 argp) [2 a+0 S4 A32]))) This is invalid RTX, where both input arguments are in memory. The optimized tree dump for foo7 is: <bb 2>: _2 = __builtin_bswap32 (a_1(D)); _4 = __builtin_bswap32 (b_3(D)); _5 = _4 ^ _2; _6 = __builtin_bswap32 (_5); [tail call] return _6; It looks to me that the optimization has to be re-implemented as tree optimization (probably by extending fold_builtin_bswap in builtins.c). This generic optimization will also benefit targets without bswap RTX pattern, e.g. plain i386, as observed in Comment #2. I'm recategorizing the PR as a tree-optimization.