Hello,
The expression (A ^ B) & C ^ B is the canonical GIMPLE form because it
optimizes for instruction count on generic targets. However, for
targets that support ANDN (like PowerPC andc), the equivalent
(A & C) | (B & ~C) form is preferable because it reduces register
dependency chain.
Currently, GCC generates the XOR form which creates a serial chain with
two dependencies:
xor 4,3,4
and 4,4,5
xor 3,4,3
With this patch, using IFN_BIT_ANDN, we generate the IOR form. This
allows the two bitwise operations to execute independently, reducing
the path to a single dependency for the final instruction:
andc 3,3,5
and 2,4,5
or 3,2,3
This patch fixes PR90323 and PR122431. Tested on powerpc64le-linux-gnu
with no regressions.
Thanks,
Kishan
2025-12-10 Kishan Parmar <[email protected]>
gcc/ChangeLog:
PR tree-optimization/122431 target/90323
* config/rs6000/rs6000.md (andn<mode>3): New define_expand andn for
scalar types.
* match.pd: Add late simplification to convert to (A & C) | (B & ~C)
form if target support ANDN optab.
---
gcc/config/rs6000/rs6000.md | 8 ++++++++
gcc/match.pd | 10 ++++++++++
2 files changed, 18 insertions(+)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ff085bf9bb1..cea4e765630 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3894,6 +3894,14 @@
(set_attr "dot" "yes")
(set_attr "length" "8,12")])
+;; Standard andn to enable IFN_BIT_ANDN support.
+(define_expand "andn<mode>3"
+ [(set (match_operand:GPR 0 "gpc_reg_operand")
+ (and:GPR (not:GPR (match_operand:GPR 2 "gpc_reg_operand"))
+ (match_operand:GPR 1 "gpc_reg_operand")))]
+ ""
+ "")
+
(define_insn_and_split "*branch_anddi3_dot"
[(set (pc)
(if_then_else (eq (and:DI (match_operand:DI 1 "gpc_reg_operand" "%r,r")
diff --git a/gcc/match.pd b/gcc/match.pd
index bf410a75f5f..300a6a32154 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -12074,6 +12074,16 @@ and,
(IFN_AVG_CEIL @0 @2)))
#endif
+#if GIMPLE
+/* Simplify (A ^ B) & C ^ B -> (A & C) | (B & ~C) if target supports ANDN. */
+
+(simplify
+ (bit_xor:c (bit_and:c (bit_xor:c @0 @1) @2) @1)
+ (if (fold_before_rtl_expansion_p () && TREE_CODE (@2) != INTEGER_CST
+ && direct_internal_fn_supported_p (IFN_BIT_ANDN, type,
OPTIMIZE_FOR_BOTH))
+ (bit_ior (bit_and @0 @2) (IFN_BIT_ANDN @1 @2))))
+#endif
+
/* vec shift left insert (dup (A), A) -> dup(A) */
(simplify
(IFN_VEC_SHL_INSERT (vec_duplicate@1 @0) @0)
--
2.47.3