https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632
--- Comment #2 from Mel Chen <bina2374 at gmail dot com> --- (In reply to Jim Wilson from comment #1) > We sign extend HImode constants as that is the natural thing to do to make > arithmetic work. This does mean that unsigned short logical operations need > a zero extend after the operation which might otherwise be unnecessary. > This can't be handled at rtl generation time as we don't know if the > constant will be used for arithmetic or logicals or signed or unsigned. But > maybe an optimization pass could go over the code and convert HImode > constants to signed or unsigned as appropriate to reduce the number of > sign/zero extend operations. We have the ree pass that we might be able to > extend to handle this. Extend ree pass is a good way, but now it seems only scanning XXX_extend. Because the zero_extend has been split to 2 shift instructions before ree pass, do we need to keep zero_extend until ree pass? Or is there any other way to know that the shift pair was a zero_extend? > > Handling this in combine requires a 4->3 splitter which is something combine > doesn't do. We could work around that by not splitting constants before > combine, but that would be a major change and probably not beneficial, as we > wouldn't be able to easily optimize the high part of the constants anymore. I agree. This way is a bit risky. > > Another approach here might be to split the xor along with the constant. If > we generated something like > srli a0,a0,1 > xori a0,a0,1 > li a5,-24576 > xor a0,a0,a5 > then we can optimize away the following zero extend with a 3->2 splitter > which combine already supports via find_split_point. We can still optimize > the high part of the constant. Since the immediates are sign extended, if > the low part of the immediate has the sign bit set, we would have to invert > the high part of the immediate to get the right result. At least I think > that works, I haven't double checked it yet. This only works for or if the > low part doesn't have the sign bit set. And this only works for and if the > low part does have the sign bit set. I'm not sure how difficult it is to split 1 xor to 2 xor before combine pass, but I have another proposal: The following dump is combine dump: Trying 8, 9, 10 -> 11: 8: r79:SI=0xffffffffffffa000 9: r78:SI=r79:SI+0x1 REG_DEAD r79:SI REG_EQUAL 0xffffffffffffa001 10: r77:SI=r72:SI^r78:SI REG_DEAD r78:SI REG_DEAD r72:SI 11: r80:SI=zero_extend(r77:SI#0) REG_DEAD r77:SI Failed to match this instruction: (set (reg:SI 80) (xor:SI (reg:SI 72 [ _4 ]) (const_int 40961 [0xa001]))) Is it possible to pretend that we have a pattern that can match xor (reg:SI 80), (reg: SI 72), 0xa001 in combine pass? And then, if the constant part is too large to put in to the immediate part, it can be split to 2 xor in split pass.