> On 7 Nov 2025, at 10:42, Kyrylo Tkachov <[email protected]> wrote: > > > >> On 7 Nov 2025, at 10:23, Andre Vieira <[email protected]> wrote: >> >> Expands the use of eor3 where we'd otherwise use two vector eor's. >> >> Bootstrapped and regression tested on aarch64-none-linux-gnu. >> >> OK for trunk? >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-simd.md (*eor3q<mode>4): New insn to be used by >> combine after reload to optimize any grouping of eor's that are using >> FP registers for >> scalar modes. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/eor3-opt.c: New test. >> >> <eor3.patch> > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 0d5b02a739fa74724d6dc8b658638d55b8db6890..3bf668e25b58a463f1d35387b1c6af7cc04e3a16 > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -9201,15 +9201,28 @@ > > ;; sha3 > > -(define_insn "eor3q<mode>4" > - [(set (match_operand:VDQ_I 0 "register_operand" "=w") > - (xor:VDQ_I > - (xor:VDQ_I > - (match_operand:VDQ_I 2 "register_operand" "w") > - (match_operand:VDQ_I 3 "register_operand" "w")) > - (match_operand:VDQ_I 1 "register_operand" "w")))] > +(define_insn_and_split "eor3q<mode>4" > + [(set (match_operand:VSDQ_I 0 "register_operand") > + (xor:VSDQ_I > + (xor:VSDQ_I > + (match_operand:VSDQ_I 2 "register_operand") > + (match_operand:VSDQ_I 3 "register_operand")) > + (match_operand:VSDQ_I 1 "register_operand")))] > "TARGET_SHA3" > - "eor3\\t%0.16b, %1.16b, %2.16b, %3.16b" > + {@ [ cons: =0 , %1 , 2 , 3 ] > + [ w , w , w , w ] eor3\t%0.16b, %1.16b, %2.16b, %3.16b > + [ r , r , r , r ] # > + } > + "&& reload_completed && GP_REGNUM_P (REGNO (operands[0]))” > The “=r,r,r,r” alternative should only be allowed for 64-bit modes? > Maybe it’s cleaner to split this pattern to allow the define_and_split just > for VD_I modes? > > > + [(const_int 0)] > + { > + machine_mode xor_mode = <MODE>mode == DImode ? DImode : SImode; > Consequence of the above, this path can be executed for non-32 or 64-bit > modes as well. > > + emit_move_insn (operands[0], > + gen_rtx_XOR (xor_mode, operands[1], operands[2])); > That doesn’t seem like it would generate valid RTL. I think the XOR needs to > have the same mode as its operands. > So operands[1], operands[2] need to be wrapped in a subregion of an > appropriate mode to keep things consistent
Subregion -> subreg, of course. > Thanks, > Kyrill > > + emit_move_insn (operands[0], > + gen_rtx_XOR (xor_mode, operands[0], operands[3])); > + DONE; > + } > [(set_attr "type" "crypto_sha3")] > )
