On Fri, Jun 06, 2025 at 10:02:19AM +0100, Richard Sandiford wrote: > Spencer Abson <spencer.ab...@arm.com> writes: > > @@ -27292,10 +27291,16 @@ aarch64_emit_sve_invert_fp_cond (rtx target, > > rtx_code code, rtx pred, > > void > > aarch64_expand_sve_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx > > op1) > > { > > - machine_mode pred_mode = GET_MODE (target); > > machine_mode data_mode = GET_MODE (op0); > > + rtx pred = aarch64_sve_fp_pred (data_mode, nullptr); > > > > - rtx ptrue = aarch64_ptrue_reg (pred_mode); > > + /* The governing and destination modes. */ > > + machine_mode pred_mode = GET_MODE (pred); > > + machine_mode target_mode = GET_MODE (target); > > + > > + /* Also determines SVE_KNOWN_PTRUE, since an unnatural GP from > > + sve_fp_pred would disable part of the operation. */ > > + bool natural_p = pred_mode == target_mode; > > I'm not sure about calling this "unnatural". The SVE predicate system > was designed with this kind of flexibility in mind. How about: > > /* For partial vector modes, the choice of predicate mode depends > on whether we need to suppress exceptions for inactive elements. > If we do need to suppress exceptions, the predicate mode matches > the element size rather than the container size and the predicate > marks the upper bits in each container as inactive. The predicate > is then a ptrue wrt target_mode but not wrt pred_mode. It is the > ptrueness wrt pred_mode that matters here. > > If we don't need to suppress exceptions, the predicate mode matches > the container size, pred_mode == target_mode, and the predicate is > thus a ptrue wrt both target_mode and pred_mode. */ > bool known_ptrue_p = pred_mode == target_mode;
OK, I think referring to containers and elements is a good call. Maybe we ought to add a comment above the definition of VPRED, along the lines of: ;; For partial vector modes, this is the predicate mode associated ;; with the container size. Then your suggestion for patch 06 sounds good too. > > There again, maybe my comment makes no sense to anyone other than me, > so please do push back if you have a better suggestion! Only that perhaps the last part of the first section could be: /* The predicate is then a ptrue wrt target_mode but not wrt pred_mode, it is the latter which matters here. */ I'll be adding 'ptrueness' to my dictionary either way! :) > > > switch (code) > > { > > case UNORDERED: > > [...] > > @@ -27333,11 +27338,21 @@ aarch64_expand_sve_vec_cmp_float (rtx target, > > rtx_code code, rtx op0, rtx op1) > > case UNGE: > > if (flag_trapping_math) > > { > > - /* Work out which elements are ordered. */ > > - rtx ordered = gen_reg_rtx (pred_mode); > > op1 = force_reg (data_mode, op1); > > - aarch64_emit_sve_invert_fp_cond (ordered, UNORDERED, > > - ptrue, true, op0, op1); > > + > > + /* Work out which elements are unordered. */ > > + rtx uo_tmp = gen_reg_rtx (target_mode); > > + aarch64_emit_sve_fp_cond (uo_tmp, UNORDERED, pred, natural_p, > > + op0, op1); > > + > > + /* Invert the result. Use PRED again to maintain the intended > > + trapping behavior. */ > > + if (!natural_p) > > + uo_tmp = gen_lowpart (pred_mode, uo_tmp); > > The !natural_p test isn't necessary here, and IMO it's slightly easier > to follow the code without it. The lowpart is a natural component of > the following XOR/NOT operation and is necessary to make the operation > well-typed. > > > + > > + rtx ordered = gen_reg_rtx (pred_mode); > > + emit_insn (gen_aarch64_pred_z (XOR, pred_mode, > > + ordered, pred, pred, uo_tmp)); > > Although the underlying instruction is an EOR, (and (xor a b) a) isn't > canonical rtl: it should be folded to (and (not b) a) instead. > > So I think we should rename: > > ;; Predicated predicate inverse. > (define_insn "*one_cmpl<mode>3" > [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") > (and:PRED_ALL > (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa")) > (match_operand:PRED_ALL 1 "register_operand" "Upa")))] > "TARGET_SVE" > "not\t%0.b, %1/z, %2.b" > ) > > to "@aarch64_pred_one_cmpl<mode>_z" and use gen_aarch64_pred_one_compl_z > here. (Not the most elegant instruction name, but I suppose we should > be consistent...) > > This will need updates to the testcase to match NOT rather than EOR. Thanks for the catch & fix, sounds good. > > OK with those changes, thanks. > > Richard