[Bug target/95632] Redundant zero extension

bina2374 at gmail dot com Mon, 15 Jun 2020 02:50:03 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632


--- Comment #2 from Mel Chen <bina2374 at gmail dot com> ---
(In reply to Jim Wilson from comment #1)
> We sign extend HImode constants as that is the natural thing to do to make
> arithmetic work.  This does mean that unsigned short logical operations need
> a zero extend after the operation which might otherwise be unnecessary. 
> This can't be handled at rtl generation time as we don't know if the
> constant will be used for arithmetic or logicals or signed or unsigned.  But
> maybe an optimization pass could go over the code and convert HImode
> constants to signed or unsigned as appropriate to reduce the number of
> sign/zero extend operations.  We have the ree pass that we might be able to
> extend to handle this.

Extend ree pass is a good way, but now it seems only scanning XXX_extend.
Because the zero_extend has been split to 2 shift instructions before ree pass,
do we need to keep zero_extend until ree pass? Or is there any other way to
know that the shift pair was a zero_extend?
> 
> Handling this in combine requires a 4->3 splitter which is something combine
> doesn't do.  We could work around that by not splitting constants before
> combine, but that would be a major change and probably not beneficial, as we
> wouldn't be able to easily optimize the high part of the constants anymore.

I agree. This way is a bit risky.
> 
> Another approach here might be to split the xor along with the constant.  If
> we generated something like
>       srli    a0,a0,1
>         xori    a0,a0,1
>       li      a5,-24576
>       xor     a0,a0,a5
> then we can optimize away the following zero extend with a 3->2 splitter
> which combine already supports via find_split_point.  We can still optimize
> the high part of the constant. Since the immediates are sign extended, if
> the low part of the immediate has the sign bit set, we would have to invert
> the high part of the immediate to get the right result.  At least I think
> that works, I haven't double checked it yet.  This only works for or if the
> low part doesn't have the sign bit set.  And this only works for and if the
> low part does have the sign bit set.

I'm not sure how difficult it is to split 1 xor to 2 xor before combine pass,
but I have another proposal:

The following dump is combine dump:
Trying 8, 9, 10 -> 11:
    8: r79:SI=0xffffffffffffa000
    9: r78:SI=r79:SI+0x1
      REG_DEAD r79:SI
      REG_EQUAL 0xffffffffffffa001
   10: r77:SI=r72:SI^r78:SI
      REG_DEAD r78:SI
      REG_DEAD r72:SI
   11: r80:SI=zero_extend(r77:SI#0)
      REG_DEAD r77:SI
Failed to match this instruction:
(set (reg:SI 80)
    (xor:SI (reg:SI 72 [ _4 ])
        (const_int 40961 [0xa001])))

Is it possible to pretend that we have a pattern that can match xor (reg:SI
80), (reg: SI 72), 0xa001 in combine pass?
And then, if the constant part is too large to put in to the immediate part, it
can be split to 2 xor in split pass.

[Bug target/95632] Redundant zero extension

Reply via email to