https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80770
--- Comment #12 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jeff Law <[email protected]>: https://gcc.gnu.org/g:684d385720cd5d25df8dc69c5281fc0fb9c3bebe commit r17-434-g684d385720cd5d25df8dc69c5281fc0fb9c3bebe Author: Shreya Munnangi <[email protected]> Date: Sun May 10 21:37:29 2026 -0600 [RISC-V][PR rtl-optimization/80770] Simplify bit flipping operations down to xor So this is the target independent work to finish resolving pr80770. It's a combination of Shreya's efforts and my own. To recap, the basic idea is we want to simplify RTL blobs which ultimately are just flipping a bit. Consider: > (set (reg:DI 153) > (ior:DI (and:DI (reg:DI 140 [ *s_4(D) ]) > (const_int 254 [0xfe])) > (and:DI (not:DI (reg:DI 140 [ *s_4(D) ])) > (const_int 1 [0x1])))) The first operand of the IOR clears the low bit of the source register leaving everything else unchanged. The second operand of the IOR clears everything but the low bit and flips the low bit. When we IOR those together we get the original value with the lowest bit flipped. The key is to realize we have the same pseudo in both arms and there are no bits in common for the constants. So this works for an arbitrary bit(s) as long as the constants have the right form. That gets us good code on riscv and almost certainly helps other targets. There is another form which shows up on the H8 and possibly other targets sub-word arithmetic. op0 and op1 are respectively: > (gdb) p debug_rtx (op0) > (and:QI (reg:QI 24 [ *s_4(D) ]) > (const_int 127 [0x7f])) > $1 = void > (gdb) p debug_rtx (op1) > (plus:QI (and:QI (reg:QI 24 [ *s_4(D) ]) > (const_int -128 [0xffffffffffffff80])) > (const_int -128 [0xffffffffffffff80])) > $2 = void Note we're in QImode. op1 just flips the highest QImode bit. If there are carry-outs, we don't really care about them. The net is we can capture that case on the H8 by verifying this form flips the highest bit for the given mode. Otherwise the carry-outs are relevant and our transformation is incorrect. Plan is to commit Friday. While it has been tested with the usual bootstraps as well as testing on various cross platforms, I'm more comfortable giving folks time to take a looksie to see if Shreya or I missed anything critical. For the testcase in question before/afters look like this: x86: movzbl (%rdi), %eax movl %eax, %edx andl $-2, %eax andl $1, %edx xorl $1, %edx orl %edx, %eax movb %al, (%rdi) Turns into: xorb $1, (%rdi) RISC-V: lbu a5,0(a0) andi a4,a5,1 xori a4,a4,1 andi a5,a5,-2 or a5,a5,a4 sb a5,0(a0) Turns into: lbu a5,0(a0) xori a5,a5,1 sb a5,0(a0) PR rtl-optimization/80770 gcc/ * rtl.h (simplify_context::simplify_ior_with_common_term): Add new method. (simplify_context::simplify_binary_operation_1): Use new method. * simplify-rtx.cc (simplify_context::simplify_ior_with_common_term): New method. gcc/testsuite/ * gcc.target/riscv/pr80770.c: New test. * gcc.target/riscv/pr80770-2.c: New test. * gcc.target/h8300/pr80770.c: New test. * gcc.target/h8300/pr80770-2.c: New test. Co-authored-by: Jeff Law <[email protected]>
