https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61559

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-09-03
                 CC|                            |rguenth at gcc dot gnu.org
          Component|rtl-optimization            |tree-optimization
     Ever confirmed|0                           |1

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Eric Botcazou from comment #4)
> I guess the transformations should accept MEMs instead of just REGs but, no,
> I'm not particularly interested in quirks of CISC architectures, I have
> enough to do with those of RISC architectures.

The problem is that with both function arguments in memory, combine simplifies
sequence of bswaps with memory argument ( == movbe) in foo7 to:

Failed to match this instruction:
(set (reg:SI 84 [ D.2318 ])
    (xor:SI (mem/c:SI (plus:SI (reg/f:SI 16 argp)
                (const_int 4 [0x4])) [2 b+0 S4 A32])
        (mem/c:SI (reg/f:SI 16 argp) [2 a+0 S4 A32])))

This is invalid RTX, where both input arguments are in memory.

The optimized tree dump for foo7 is:

  <bb 2>:
  _2 = __builtin_bswap32 (a_1(D));
  _4 = __builtin_bswap32 (b_3(D));
  _5 = _4 ^ _2;
  _6 = __builtin_bswap32 (_5); [tail call]
  return _6;

It looks to me that the optimization has to be re-implemented as tree
optimization (probably by extending fold_builtin_bswap in builtins.c). This
generic optimization will also benefit targets without bswap RTX pattern, e.g.
plain i386, as observed in Comment #2.

I'm recategorizing the PR as a tree-optimization.

Reply via email to