Re: [PATCH 11/14] aarch64: Add support for unpacked SVE FP conditional binary arithmetic

Spencer Abson Mon, 09 Jun 2025 03:06:27 -0700

On Fri, Jun 06, 2025 at 03:52:12PM +0100, Richard Sandiford wrote:
> Spencer Abson <spencer.ab...@arm.com> writes:
> > @@ -8165,20 +8169,25 @@
> >  ;;
> >  ;; For unpacked vectors, it doesn't really matter whether SEL uses the
> >  ;; the container size or the element size.  If SEL used the container size,
> > -;; it would ignore undefined bits of the predicate but would copy the
> > -;; upper (undefined) bits of each container along with the defined bits.
> > -;; If SEL used the element size, it would use undefined bits of the 
> > predicate
> > -;; to select between undefined elements in each input vector.  Thus the 
> > only
> > -;; difference is whether the undefined bits in a container always come from
> > -;; the same input as the defined bits, or whether the choice can vary
> > +;; it would ignore bits of the predicate that can be undefined, but would 
> > copy
> > +;; the upper (undefined) bits of each container along with the defined 
> > bits.
> > +;; If SEL used the element size, it would use bits from the predicate that 
> > can
> > +;; be undefined to select between undefined elements in each input vector.
> > +;; Thus the only difference is whether the undefined bits in a container 
> > always
> > +;; come from the same input as the defined bits, or whether the choice can 
> > vary
> 
> It looks like the main change here is to replace "undefined bits of the
> predicate" with "bits of the predicate that can be undefined".  Could
> you go into more detail about the distinction?  It seems to be saying
> that the upper bits in each predicate are sometimes defined and
> sometimes not.
> 
> As I see it, the "level of undefinedness" is really the same for the
> predicates and data vectors.  Within each container element, the bits
> that hold the element value are defined/significant and the other bits
> are undefined/insignificant/have arbitrary values.  The same thing
> goes for the upper bits in each predicate element: the low bit is
> defined/significant and the other bits are undefined/insignificant/have
> arbitrary values.  They might by chance be zeros when zeros are
> convenient or ones when ones are convenient, but the semantics don't
> guarantee anything either way.


Yes, I agree.  Sorry, my change was not very clear.

What I was trying to reflect is that, for example, selecting between
a pair of VNx4HF using VNx8BI is now a recognised insn.  However, any
bits of a VNx8BI that are not significant to a VNx4BI are don't-care
wrt the result.

I meant that they 'can be undefined' in that they are as good as
undefined, for the purpose of SEL.  Maybe a better change would
have been to reword this paragraph in favour of 'don't-care' rather
than 'undefined' when describing the upper bits of each predicate
element?

> 
> >  ;; independently of the defined bits.
> >  ;;
> >  ;; For the other instructions, using the element size is more natural,
> >  ;; so we do that for SEL as well.
> > +;;
> > +;; The use of 'aarch64_predicate_operand' here is only to support the FP 
> > arithmetic/
> > +;; UNSPEC_SEL combiner patterns.  As with those operations, any predicate 
> > bits that
> > +;; are insignificant to the data mode will have no effect on the 
> > operation's result.
> 
> Sorry for the formatting nit, but: long lines.  The comment itself looks good
> though.
> 
> > +;;
> >  (define_insn "*vcond_mask_<mode><vpred>"
> >    [(set (match_operand:SVE_ALL 0 "register_operand")
> >     (unspec:SVE_ALL
> > -     [(match_operand:<VPRED> 3 "register_operand")
> > +     [(match_operand:<VPRED> 3 "aarch64_predicate_operand")
> >        (match_operand:SVE_ALL 1 "aarch64_sve_reg_or_dup_imm")
> >        (match_operand:SVE_ALL 2 "aarch64_simd_reg_or_zero")]
> >       UNSPEC_SEL))]
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 287de0f5ae4..d38b108c5f4 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -3893,6 +3893,33 @@ aarch64_sve_fp_pred (machine_mode data_mode, rtx 
> > *strictness)
> >     return aarch64_ptrue_reg (aarch64_sve_pred_mode (data_mode));
> >  }
> >  
> > +/* If DATA_MODE is a partial vector mode, emit a sequence of insns to
> > +   zero-out the predicate bits of an existing natural GP, PRED, associated
> > +   with the undefined elements in each container
> 
> This makes it sound unconditional, whereas it's really conditional on
> flag_trapping_math.
> 
> Also, I'd see this more as converting a container predicate to an
> element predicate.  So maybe:
> 
> /* PRED is a predicate that governs an operation on DATA_MODE.  If DATA_MODE
>    is a partial vector mode, and if exceptions must be suppressed for its
>    undefined elements, convert PRED from a container-level predicate to
>    an element-level predicate and ensure that the undefined elements
>    are inactive.
> 
> But again, please suggest something else if you don't think that's
> very clear.

Yeah, happy to stick to the container/element size language.  Thanks.

> 
> > +
> > +   Return NULL_RTX if no insns were emitted.  That is, if DATA_MODE is not
> > +   a partial vector mode, or if we don't need to prevent the operation from
> > +   interpreting undefined elements.  Otherwise, return the new predicate.  
> > */
> > +rtx
> > +aarch64_sve_emit_masked_fp_pred (machine_mode data_mode, rtx pred)
> > +{
> > +  unsigned int vec_flags = aarch64_classify_vector_mode (data_mode);
> > +  if (flag_trapping_math && (vec_flags & VEC_PARTIAL))
> > +    {
> > +      /* Control the data as if the vector was fully packed.  */
> > +      rtx mask = aarch64_sve_packed_pred (data_mode);
> > +      machine_mode pmode = GET_MODE (mask);
> > +
> > +      /* Mask the existing predicate.  */
> > +      rtx dst = gen_reg_rtx (pmode);
> > +      emit_insn (gen_and3 (pmode, dst, mask,
> > +                      gen_lowpart (pmode, pred)));
> > +      return dst;
> > +    }
> > +
> > +  return NULL_RTX;
> > +}
> 
> How about instead returning PRED in the fallthrough case, rather than
> null?  It seems like that would be more convenient for the callers
> in this patch and in patch 14, and would also be more consistent
> with aarch64_sve_fp_pred.

Sure, that makes sense.  I likely wanted to return something that
easily indicates whether the function had side-effects or not.

I'll give it some more thought, but happy with your suggestion too.

> 
> As with the earlier patches, it would be good to have a token test
> that SELs aren't dropped when they shouldn't be.
> 
> BTW, since the series started with a lot of patches that were OK as-is,
> or with minor changes, would you be ok with committing those and only
> reposting the part of the series that might need a v2?

Yeah, I'll leave some time for discussion and commit the ones with
minor changes if that's OK.

Thanks,
Spencer
> 
> Thanks,
> Richard

Re: [PATCH 11/14] aarch64: Add support for unpacked SVE FP conditional binary arithmetic

Reply via email to