Jeffrey Law <[email protected]> writes:
> On 5/6/2026 2:59 AM, Richard Sandiford wrote:
>> I think this is different enough that it's worth giving the expression
>> in the comment, i.e.
>>
>>     (ior (and A C1) (plus (and A C1) C2))
>>
>> It took me a while to work that out from the condition :)
> So I fixed the two paren typos, adjusted the comment for the second pair 
> of cases, then collapsed from 4 cases down to 2 using a loop and 
> swapping of local variables.
>
>
>>
>> I wondered whether this one could/should be generalised to use
>> nonzero_bits, rather than checking specifically for (and A C1).
>> That is,
>>
>>     (ior X (plus X C))
>>
>> (or the reverse, for simple X) is equivalent to (ior X C)
>> if the low bit of C is above the highest nonzero bit of X.
>> In the above example, that would give us:
>>
>>     (ior (and A C1) C2)
>>
>> which admittedly is different from the expansion in the patch,
>> but is equally simple.  This is one of those unfortunate cases where
>> there are many ways of writing the same thing...
>>
>> The (ior X (plus X C)) with nonzero_bits thing works for xor or ior in
>> place of plus.
> Yea, this can be shown with a trivial test:
>>
>> int
>> foo(int x)
>> {
>>   x &= 0xf;
>>   x |= (x + 0x80);
>>   return x;
>> }
>>
>> int
>> foo2(int x)
>> {
>>   x &= 0xf;
>>   x |= (x | 0x80);
>>   return x;
>> }
>
> Which generates this on rv64:
>
>> foo:
>>         andi    a0,a0,15
>>         addi    a5,a0,128
>>         or      a0,a5,a0
>>         ret
>>         .size   foo, .-foo
>>         .align  2
>>         .globl  foo2
>>         .type   foo2, @function
>> foo2:
>>         andi    a0,a0,15
>>         ori     a0,a0,128
>>         ret
>
> I think the transform you're looking for really just requires that there 
> be no bits in common between the nonzero bits in X and the constant.  
>   If there's no common bits, then the PLUS or XOR can turn into an IOR.  
>   That's trivial to include.

Ah, yeah.

> Here's the net.  I also added a test for the H8 for giggles.  H8 isn't 
> optimal as it's not using "bnot", but that's an independent target issue.
>
> I've spot checked on x86 which looks good as well.  For the main 
> testcase we're typically squashing 7 instructions down to a single xor 
> or addb RMW operation.

Nice!

> Further suggestions?
>
> Jeff
>
>
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index ee3d9ec32082..08ae224847b1 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3900,6 +3900,83 @@ simplify_context::simplify_binary_operation_1 
> (rtx_code code,
>         && negated_ops_p (XEXP (op0, 0), op1))
>       return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
>  
> +      {
> +     rtx top0 = op0;
> +     rtx top1 = op1;
> +     for (int i = 0; i < 2; i++)
> +       {
> +         /* The outer IOR is commutative so try op0/op1 in their
> +            original position and reversed.  */
> +         if (i == 1)
> +           std::swap (top0, top1);

FWIW, it would also work to keep op0 and op1 and do the std::swap
at the end of the loop body.  Maybe that would be less error prone
wrt top vs op.

> +
> +         /* (ior X (plus/xor X C)) can be simplified into (ior X C) when
> +            X and C have no bits in common.  */

Maybe s/C/Y/ now that we're not requiring a constant.

> +         if ((GET_CODE (top1) == PLUS || GET_CODE (top1) == XOR)
> +             && rtx_equal_p (top0, XEXP (top1, 0))
> +             && ((nonzero_bits (top0, GET_MODE (top0))
> +                 & nonzero_bits (XEXP (top1, 1), GET_MODE (top1))) == 0)
> +             && !side_effects_p (top1))
> +           return simplify_gen_binary (IOR, mode, top0, XEXP (top1, 1));
> +
> +         /* (ior (and A C1) (and (not A) C2)) can be converted
> +            into (and (xor A C2) (C1 + C2)) when there are no bits
> +            in common between C1 and C2.  */
> +         if (GET_CODE (top0) == AND
> +             && GET_CODE (top1) == AND
> +             && GET_CODE (XEXP (top1, 0)) == NOT
> +             && rtx_equal_p (XEXP (top0, 0), XEXP (XEXP (top1, 0), 0))
> +             && CONST_INT_P (XEXP (top0, 1))
> +             && CONST_INT_P (XEXP (top1, 1))
> +             && (INTVAL (XEXP (top0, 1)) & INTVAL (XEXP (top1, 1))) == 0)
> +           {
> +             rtx c = GEN_INT (INTVAL (XEXP (top0, 1)) + INTVAL (XEXP (top1, 
> 1)));
> +
> +             tem = simplify_gen_binary (XOR, mode, XEXP (top0, 0), XEXP 
> (top1, 1));
> +             if (tem)
> +               {
> +                 tem = simplify_gen_binary (AND, mode, tem, c);
> +
> +                 if (tem)
> +                   return tem;
> +               }
> +           }
> +
> +         /* Another variant seen on some targets particularly those with
> +            sub-word operations.
> +
> +            (ior (and A C1) (plus (and A C2) C2)) can be simplified into
> +            (and (xor A C2) (C1 + C2)).
> +
> +            Where C2 is the sign bit for A's mode.  So 0x80 for QI,
> +            0x8000 for HI, etc.  In this case we know there is no carry
> +            from the PLUS into relevant bits of the output.  */
> +         if (GET_CODE (top0) == AND
> +             && GET_CODE (top1) == PLUS
> +             && GET_CODE (XEXP (top1, 0)) == AND
> +             && rtx_equal_p (XEXP (top0, 0), XEXP (XEXP (top1, 0), 0))
> +             && CONST_INT_P (XEXP (top0, 1))
> +             && CONST_INT_P (XEXP (top1, 1))
> +             && CONST_INT_P (XEXP (XEXP (top1, 0), 1))
> +             && INTVAL (XEXP (top1, 1)) == INTVAL (XEXP (XEXP (top1, 0), 1))
> +             && GET_MODE_BITSIZE (GET_MODE (top1)).is_constant ()
> +             && ((INTVAL (XEXP (top1, 1)) & GET_MODE_MASK (GET_MODE (top1)))
> +                 == HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (GET_MODE 
> (top1)).to_constant () - 1))
> +             && (INTVAL (XEXP (top0, 1)) & INTVAL (XEXP (top1, 1))) == 0)
> +           {
> +             rtx c = GEN_INT (INTVAL (XEXP (top0, 1)) + INTVAL (XEXP (top1, 
> 1)));
> +
> +             tem = simplify_gen_binary (XOR, mode, XEXP (top0, 0), XEXP 
> (top1, 1));
> +             if (tem)
> +               {
> +                 tem = simplify_gen_binary (AND, mode, tem, c);
> +                 if (tem)
> +                   return tem;
> +               }
> +           }

Do you still need the last transformation?  I was hoping that
nonzero_bits would return C1 for (and A C1), and so the first test
above would handle it.

Otherwise LGTM FWIW.

Thanks,
Richard

> +       }
> +      }
> +
>        tem = simplify_with_subreg_not (code, mode, op0, op1);
>        if (tem)
>       return tem;
> diff --git a/gcc/testsuite/gcc.target/h8300/pr80770-2.c 
> b/gcc/testsuite/gcc.target/h8300/pr80770-2.c
> new file mode 100644
> index 000000000000..d2b491ed24e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/h8300/pr80770-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=gnu99 -mint32" } */
> +
> +
> +int foop(int x) { x &= 0xf; x |= (x + 0x80); return x; }
> +int foox(int x) { x &= 0xf; x |= (x ^ 0x80); return x; }
> +
> +/* { dg-final { scan-assembler-not "add" } } */
> +/* { dg-final { scan-assembler-not "xor" } } */
> +/* { dg-final { scan-assembler-times "and" 2 } } */
> +/* { dg-final { scan-assembler-times "or" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/h8300/pr80770.c 
> b/gcc/testsuite/gcc.target/h8300/pr80770.c
> new file mode 100644
> index 000000000000..bccf2ff66c66
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/h8300/pr80770.c
> @@ -0,0 +1,75 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=gnu99 -mint32" } */
> +
> +
> +struct S {
> +  _Bool b0: 1;
> +  _Bool b1: 1;
> +  _Bool b2: 1;
> +  _Bool b3: 1;
> +  _Bool b4: 1;
> +  _Bool b5: 1;
> +  _Bool b6: 1;
> +  _Bool b7: 1;
> +  _Bool b8: 1;
> +  _Bool b9: 1;
> +  _Bool b10: 1;
> +  _Bool b11: 1;
> +  _Bool b12: 1;
> +  _Bool b13: 1;
> +  _Bool b14: 1;
> +  _Bool b15: 1;
> +  _Bool b16: 1;
> +  _Bool b17: 1;
> +  _Bool b18: 1;
> +  _Bool b19: 1;
> +  _Bool b20: 1;
> +  _Bool b21: 1;
> +  _Bool b22: 1;
> +  _Bool b23: 1;
> +  _Bool b24: 1;
> +  _Bool b25: 1;
> +  _Bool b26: 1;
> +  _Bool b27: 1;
> +  _Bool b28: 1;
> +  _Bool b29: 1;
> +  _Bool b30: 1;
> +  _Bool b31: 1;
> +};
> +
> +#define T(N) void fb##N (struct S *s) { s->b##N = !s->b##N; }
> +
> +T(0)
> +T(1)
> +T(2)
> +T(3)
> +T(4)
> +T(5)
> +T(6)
> +T(7)
> +T(8)
> +T(9)
> +T(10)
> +T(11)
> +T(12)
> +T(13)
> +T(14)
> +T(15)
> +T(16)
> +T(17)
> +T(18)
> +T(19)
> +T(20)
> +T(21)
> +T(22)
> +T(23)
> +T(24)
> +T(25)
> +T(26)
> +T(27)
> +T(28)
> +T(29)
> +T(30)
> +T(31)
> +
> +/* { dg-final { scan-assembler-times "xor\t|add.b\t|bnot\t" 32 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr80770-2.c 
> b/gcc/testsuite/gcc.target/riscv/pr80770-2.c
> new file mode 100644
> index 000000000000..1514a49c5605
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr80770-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=gnu99" } */
> +
> +
> +int foop(int x) { x &= 0xf; x |= (x + 0x80); return x; }
> +int foox(int x) { x &= 0xf; x |= (x ^ 0x80); return x; }
> +
> +/* { dg-final { scan-assembler-not "add" } } */
> +/* { dg-final { scan-assembler-not "xor" } } */
> +/* { dg-final { scan-assembler-times "andi\t" 2 } } */
> +/* { dg-final { scan-assembler-times "ori\t" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr80770.c 
> b/gcc/testsuite/gcc.target/riscv/pr80770.c
> new file mode 100644
> index 000000000000..4dafe3955f05
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr80770.c
> @@ -0,0 +1,150 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-std=gnu99" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" } } */
> +
> +
> +struct S {
> +  _Bool b0: 1;
> +  _Bool b1: 1;
> +  _Bool b2: 1;
> +  _Bool b3: 1;
> +  _Bool b4: 1;
> +  _Bool b5: 1;
> +  _Bool b6: 1;
> +  _Bool b7: 1;
> +  _Bool b8: 1;
> +  _Bool b9: 1;
> +  _Bool b10: 1;
> +  _Bool b11: 1;
> +  _Bool b12: 1;
> +  _Bool b13: 1;
> +  _Bool b14: 1;
> +  _Bool b15: 1;
> +  _Bool b16: 1;
> +  _Bool b17: 1;
> +  _Bool b18: 1;
> +  _Bool b19: 1;
> +  _Bool b20: 1;
> +  _Bool b21: 1;
> +  _Bool b22: 1;
> +  _Bool b23: 1;
> +  _Bool b24: 1;
> +  _Bool b25: 1;
> +  _Bool b26: 1;
> +  _Bool b27: 1;
> +  _Bool b28: 1;
> +  _Bool b29: 1;
> +  _Bool b30: 1;
> +  _Bool b31: 1;
> +  _Bool b32: 1;
> +  _Bool b33: 1;
> +  _Bool b34: 1;
> +  _Bool b35: 1;
> +  _Bool b36: 1;
> +  _Bool b37: 1;
> +  _Bool b38: 1;
> +  _Bool b39: 1;
> +  _Bool b40: 1;
> +  _Bool b41: 1;
> +  _Bool b42: 1;
> +  _Bool b43: 1;
> +  _Bool b44: 1;
> +  _Bool b45: 1;
> +  _Bool b46: 1;
> +  _Bool b47: 1;
> +  _Bool b48: 1;
> +  _Bool b49: 1;
> +  _Bool b50: 1;
> +  _Bool b51: 1;
> +  _Bool b52: 1;
> +  _Bool b53: 1;
> +  _Bool b54: 1;
> +  _Bool b55: 1;
> +  _Bool b56: 1;
> +  _Bool b57: 1;
> +  _Bool b58: 1;
> +  _Bool b59: 1;
> +  _Bool b60: 1;
> +  _Bool b61: 1;
> +  _Bool b62: 1;
> +  _Bool b63: 1;
> +};
> +
> +#define T(N) void fb##N (struct S *s) { s->b##N = !s->b##N; }
> +
> +T(0)
> +T(1)
> +T(2)
> +T(3)
> +T(4)
> +T(5)
> +T(6)
> +T(7)
> +T(8)
> +T(9)
> +T(10)
> +T(11)
> +T(12)
> +T(13)
> +T(14)
> +T(15)
> +T(16)
> +T(17)
> +T(18)
> +T(19)
> +T(20)
> +T(21)
> +T(22)
> +T(23)
> +T(24)
> +T(25)
> +T(26)
> +T(27)
> +T(28)
> +T(29)
> +T(30)
> +T(31)
> +#if __riscv_xlen == 64
> +T(32)
> +T(33)
> +T(34)
> +T(35)
> +T(36)
> +T(37)
> +T(38)
> +T(39)
> +T(40)
> +T(41)
> +T(42)
> +T(43)
> +T(44)
> +T(45)
> +T(46)
> +T(47)
> +T(48)
> +T(49)
> +T(50)
> +T(51)
> +T(52)
> +T(53)
> +T(54)
> +T(55)
> +T(56)
> +T(57)
> +T(58)
> +T(59)
> +T(60)
> +T(61)
> +T(62)
> +T(63)
> +#endif
> +
> +/* { dg-final { scan-assembler-times "lbu\t" 64 { target rv64 } } } */
> +/* { dg-final { scan-assembler-times "lbu\t" 32 { target rv32 } } } */
> +
> +/* { dg-final { scan-assembler-times "xori\t" 64 { target rv64 } } } */
> +/* { dg-final { scan-assembler-times "xori\t" 32 { target rv32 } } } */
> +
> +
> +/* { dg-final { scan-assembler-times "sb\t" 64 { target rv64 } } } */
> +/* { dg-final { scan-assembler-times "sb\t" 32 { target rv32 } } } */

Reply via email to