Re: [PATCH 04/14] aarch64: Add support for unpacked SVE FP comparisons

Spencer Abson Mon, 09 Jun 2025 02:03:43 -0700

On Fri, Jun 06, 2025 at 10:02:19AM +0100, Richard Sandiford wrote:
> Spencer Abson <spencer.ab...@arm.com> writes:
> > @@ -27292,10 +27291,16 @@ aarch64_emit_sve_invert_fp_cond (rtx target, 
> > rtx_code code, rtx pred,
> >  void
> >  aarch64_expand_sve_vec_cmp_float (rtx target, rtx_code code, rtx op0, rtx 
> > op1)
> >  {
> > -  machine_mode pred_mode = GET_MODE (target);
> >    machine_mode data_mode = GET_MODE (op0);
> > +  rtx pred = aarch64_sve_fp_pred (data_mode, nullptr);
> >  
> > -  rtx ptrue = aarch64_ptrue_reg (pred_mode);
> > +  /* The governing and destination modes.  */
> > +  machine_mode pred_mode = GET_MODE (pred);
> > +  machine_mode target_mode = GET_MODE (target);
> > +
> > +  /* Also determines SVE_KNOWN_PTRUE, since an unnatural GP from
> > +     sve_fp_pred would disable part of the operation.   */
> > +  bool natural_p = pred_mode == target_mode;
> 
> I'm not sure about calling this "unnatural".  The SVE predicate system
> was designed with this kind of flexibility in mind.  How about:
> 
>   /* For partial vector modes, the choice of predicate mode depends
>      on whether we need to suppress exceptions for inactive elements.
>      If we do need to suppress exceptions, the predicate mode matches
>      the element size rather than the container size and the predicate
>      marks the upper bits in each container as inactive.  The predicate
>      is then a ptrue wrt target_mode but not wrt pred_mode.  It is the
>      ptrueness wrt pred_mode that matters here.
> 
>      If we don't need to suppress exceptions, the predicate mode matches
>      the container size, pred_mode == target_mode, and the predicate is
>      thus a ptrue wrt both target_mode and pred_mode.  */
>   bool known_ptrue_p = pred_mode == target_mode;


OK, I think referring to containers and elements is a good call.  Maybe
we ought to add a comment above the definition of VPRED, along the
lines of:

;; For partial vector modes, this is the predicate mode associated
;; with the container size.

Then your suggestion for patch 06 sounds good too.

> 
> There again, maybe my comment makes no sense to anyone other than me,
> so please do push back if you have a better suggestion!

Only that perhaps the last part of the first section could be:

/*  The predicate is then a ptrue wrt target_mode but not wrt pred_mode,
    it is the latter which matters here.  */

I'll be adding 'ptrueness' to my dictionary either way! :)

> 
> >    switch (code)
> >      {
> >      case UNORDERED:
> > [...]
> > @@ -27333,11 +27338,21 @@ aarch64_expand_sve_vec_cmp_float (rtx target, 
> > rtx_code code, rtx op0, rtx op1)
> >      case UNGE:
> >        if (flag_trapping_math)
> >     {
> > -     /* Work out which elements are ordered.  */
> > -     rtx ordered = gen_reg_rtx (pred_mode);
> >       op1 = force_reg (data_mode, op1);
> > -     aarch64_emit_sve_invert_fp_cond (ordered, UNORDERED,
> > -                                      ptrue, true, op0, op1);
> > +
> > +     /* Work out which elements are unordered.  */
> > +     rtx uo_tmp = gen_reg_rtx (target_mode);
> > +     aarch64_emit_sve_fp_cond (uo_tmp, UNORDERED, pred, natural_p,
> > +                               op0, op1);
> > +
> > +     /* Invert the result.  Use PRED again to maintain the intended
> > +        trapping behavior.  */
> > +     if (!natural_p)
> > +       uo_tmp = gen_lowpart (pred_mode, uo_tmp);
> 
> The !natural_p test isn't necessary here, and IMO it's slightly easier
> to follow the code without it.  The lowpart is a natural component of
> the following XOR/NOT operation and is necessary to make the operation
> well-typed.
> 
> > +
> > +     rtx ordered = gen_reg_rtx (pred_mode);
> > +     emit_insn (gen_aarch64_pred_z (XOR, pred_mode,
> > +                                    ordered, pred, pred, uo_tmp));
> 
> Although the underlying instruction is an EOR, (and (xor a b) a) isn't
> canonical rtl: it should be folded to (and (not b) a) instead.
> 
> So I think we should rename:
> 
> ;; Predicated predicate inverse.
> (define_insn "*one_cmpl<mode>3"
>   [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
>       (and:PRED_ALL
>         (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
>         (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
>   "TARGET_SVE"
>   "not\t%0.b, %1/z, %2.b"
> )
> 
> to "@aarch64_pred_one_cmpl<mode>_z" and use gen_aarch64_pred_one_compl_z
> here.  (Not the most elegant instruction name, but I suppose we should
> be consistent...)
> 
> This will need updates to the testcase to match NOT rather than EOR.

Thanks for the catch & fix, sounds good.

> 
> OK with those changes, thanks.
> 
> Richard

Re: [PATCH 04/14] aarch64: Add support for unpacked SVE FP comparisons

Reply via email to