> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: Monday, November 16, 2020 11:56 AM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw > <richard.earns...@arm.com>; nd <n...@arm.com>; Marcus Shawcroft > <marcus.shawcr...@arm.com> > Subject: Re: [PATCH v2 10/16]AArch64: Add NEON RTL patterns for Complex > Addition, Multiply and FMA. > > > +;; A conjucate is a rotation of 180* around the argand plane, or * I. > > Hmm, but a complex conjugate is a reflection around the real axis rather than > a rotation. Also, 180 degrees around the Argand plane is * -1 rather than * > I. > So…
Yes indeed, Sorry this is a comment from waaaaay in the beginning that I forgot to update.. The operation itself expects an actual reflection :) > > > +(define_int_attr rot_op [(UNSPEC_FCMLS "") > > + (UNSPEC_FCMLS180 "_conj") > > + (UNSPEC_FCMLA "") > > + (UNSPEC_FCMLA180 "_conj") > > + (UNSPEC_FCMUL "") > > + (UNSPEC_FCMUL180 "_conj") > > + (UNSPEC_CMLS "") > > + (UNSPEC_CMLA "") > > + (UNSPEC_CMLA180 "_conj") > > + (UNSPEC_CMUL "") > > + (UNSPEC_CMUL180 "_conj")]) > > + > > +(define_int_attr rotsplit1 [(UNSPEC_FCMLA "0") > > + (UNSPEC_FCMLA180 "0") > > + (UNSPEC_FCMUL "0") > > + (UNSPEC_FCMUL180 "0") > > + (UNSPEC_FCMLS "270") > > + (UNSPEC_FCMLS180 "90")]) > > + > > +(define_int_attr rotsplit2 [(UNSPEC_FCMLA "90") > > + (UNSPEC_FCMLA180 "270") > > + (UNSPEC_FCMUL "90") > > + (UNSPEC_FCMUL180 "270") > > + (UNSPEC_FCMLS "180") > > + (UNSPEC_FCMLS180 "180")]) > > I think it would be worth clarifying what the patterns do. AIUI, the effect > of > the MUL180 and MLA180 “rot_op” cases as written in the patch is to multiply > the conjugate of the first operand by the second operand, whereas the > documentation in the earlier patches implied that they should multiply the > conjugate of the second operand by the first. > But maybe I've got this wrong. > > operands to the pattern: a + bi, c + di > > conjugate of first operand * second operand [A]: > > (a - bi) * (c + di) = (ac + bd) + (ad - bc)i > > first operand * conjugate of second operand [B]: > > (a + bi) * (c - di) = (ac + bd) + (bc - ad)i > > the FCMUL180 sequence chosen by the patch: > > 0: a * (c + di) = ac + adi > 270: b * (d - ci) = bd - bci > > which gives A rather than B. Correct, it expects the conjucate In the second operand and corrects the argument order to account for it during vectorization if it's on the first operand. i.e. conjucate first returns mov v0.16b, v3.16b ldr q1, [x0, x3] ldr q2, [x1, x3] fcmla v0.4s, v1.4s, v2.4s, #0 fcmla v0.4s, v1.4s, v2.4s, #270 str q0, [x2, x3] and conjucate second returns mov v0.16b, v3.16b ldr q1, [x1, x3] ldr q2, [x0, x3] fcmla v0.4s, v1.4s, v2.4s, #0 fcmla v0.4s, v1.4s, v2.4s, #270 str q0, [x2, x3] I think at some point the documentation in got out of sync with the implementation because it turned out to be easier to conj first and flip conj second. I'll update the optab docs. > > > +;; The complex mla/mls operations always need to expand to two > instructions. > > +;; The first operation does half the computation and the second does > > +the ;; remainder. Because of this, expand early. > > +(define_expand "cml<fcmac1><rot_op><mode>4" > > + [(set (match_operand:VHSDF 0 "register_operand") > > + (plus:VHSDF (match_operand:VHSDF 1 "register_operand") > > + (unspec:VHSDF [(match_operand:VHSDF 2 > "register_operand") > > + (match_operand:VHSDF 3 > "register_operand")] > > + FCMLA_OP)))] > > + "TARGET_COMPLEX" > > +{ > > + emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (operands[0], > operands[1], > > + operands[2], operands[3])); > > + emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], > operands[0], > > + operands[2], operands[3])); > > + DONE; > > +}) > > The interface doesn't guarantee that operands[0] is distinct from the inputs, > so I think the result of the first instruction should be a temporary register. Hmm sorry I'm not sure I follow.. operands[0] doesn't have to be distinct from the inputs, and in most cases it wouldn't be. In c = c + (a * b) operands[0] should be the same as operands[1]. I'm just trying to figure out why It needs to be different :) since it's an accumulation instruction.. Thanks, Tamar > > Thanks, > Richard