> -----Original Message-----
> From: Artemiy Volkov <[email protected]>
> Sent: 30 April 2026 18:10
> To: Tamar Christina <[email protected]>
> Cc: [email protected]; Wilco Dijkstra <[email protected]>;
> [email protected]; Richard Earnshaw
> <[email protected]>; [email protected]; Alice Carlotti
> <[email protected]>; Alex Coplan <[email protected]>
> Subject: Re: [PATCH 1/4] aarch64: introduce partial AdvSIMD vector modes
> 
> On Tue, Apr 28, 2026 at 08:26:07AM +0100, Tamar Christina wrote:
> > Hi Artemiy,
> >
> > > -----Original Message-----
> > > From: Artemiy Volkov <[email protected]>
> > > Sent: 27 April 2026 09:06
> > > To: [email protected]
> > > Cc: Tamar Christina <[email protected]>; Wilco Dijkstra
> > > <[email protected]>; [email protected]; Richard
> > > Earnshaw <[email protected]>; [email protected]; Alice
> > > Carlotti <[email protected]>; Alex Coplan <[email protected]>;
> > > Artemiy Volkov <[email protected]>
> > > Subject: [PATCH 1/4] aarch64: introduce partial AdvSIMD vector modes
> > >
> > > In addition to V2HF that already exists, this patch adds 4 more partial
> > > (16- and 32-bit) AdvSIMD vector modes: V4QI, V2QI, V2HI, and V2BF.  For
> > > now, these are intended only for duplication into full-sized (32-, 64-,
> > > and 128-bit) registers.  As a minimal closure required to bootstrap the
> > > compiler, this also implements the "mov" expand and the
> > > "aarch64_simd_mov"
> > > insn_and_split for the new modes (gathered under the VSUB64 iterator).
> > >
> > > These modes are also added to aarch64_classify_vector_mode (), and are
> > > classified as VEC_ADVSIMD | VEC_PARTIAL, a yet-untaken value that
> seems to
> > > fit the bill.  This is then used in
> >
> > I haven't reviewed the whole thing yet, however I don't think we want to use
> > VEC_PARTIAL here as the context for which it's used for SVE is quite 
> > different,
> > different enough  that I don't think we should mix them.
> >
> > In SVE VEC_PARTIAL means the vector is also using an unpacked bits
> > representation, whereas your use here uses a packed one.   They
> differentiate
> > between a container and a data type, whereas here the container and data
> > type must be the same. i.e. V2QI must use .b for both data and container.
> >
> > And lastly there's a mismatch where VN2xSI is considered a partial vector
> > but here V2SI isn't.
> >
> > So instead how about just using in the helper
> > VECTOR_MODE_P && known_lt (GET_MODE_BITSIZE (mode), 64)
> > as you do in the constraint, and rename it to something like
> > aarch64_advsimd_sub_dword_mode_p, since I don't think you actually
> > need flag and it's best not have it for a separate concept between SVE
> > and Adv. SIMD.
> 
> Hi Tamar,

Hi Artemiy,

> 
> This sounds fair, I'll create the helper and use it everywhere. (I think I
> still need to call aarch64_classify_vector_mode () to filter out VLS SVE
> modes though.)

Yeah that's fine.

> 
> >
> > > aarch64_ira_change_pseudo_allocno_class () to instruct regalloc to prefer
> > > GENERAL_REGS to FP_REGS for the integer modes, i.e. V4QI, V2QI, and
> V2HI.
> >
> > Why GENERAL_REGS over FP_REGS?
> > It seems more useful to prefer FP_REGS. I see later on in the patches you
> want
> > construction using BFM? But BFM typically has the same latency but lower
> > throughput than INS.
> 
> So what I tried here is to make 32-bit and smaller modes (V2{Q,H}I, V4QI)
> behave like integers, as far as regalloc is concerned.  IOW, I want to
> avoid unnecessary regfile transfers when combining 8-bit and 16-bit
> quantities residing in GPRs.  Without this change to the hook, some tuning
> models would move those into FPRs before doing the combination, and I'm
> not sure we want that, but this is a rare scenario anyway as it requires an
> exact tie between GP and FP register classes...

Yeah, I think it makes sense to treat them as FPR and just look at the costing
If it's required.  I have no strong feelings here but seems like a more natural
fit.

Do you have an example I can look at for when this happens?

Thanks,
Tamar

> 
> >
> > >
> > > Some existing testcases were adjusted where needed.  (The _Float16
> > > testcase in sve/slp_1.c temporarily expects GPRs to be used for V2HF,
> > > which is corrected to FPRs by the succeeding patch; and the half-float
> > > complex tests now recognize some of the patterns, but check that V2BF
> > > still can't be used for vectorization.)
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-modes.def (VECTOR_MODE): Remove
> > > V2HF.
> > >   (VECTOR_MODES): Define V2QI, V4QI, V2HI, V2HF, V2BF.
> > >   * config/aarch64/aarch64-simd.md (*aarch64_simd_mov<mode>):
> > > New
> > >   define_insn_and_split pattern.
> > >   (mov<mode>): Add sub-64-bit vector modes to the VALL_F16
> > > expander.
> > >   Forego const vector expansion for those modes.
> > >   * config/aarch64/aarch64.cc
> > > (aarch64_ira_change_pseudo_allocno_class):
> > >   Prefer GPRs for 16- and 32-bit integral vector modes.
> > >   (aarch64_classify_vector_mode): Handle 16- and 32-bit vector
> > > modes.
> > >   (aarch64_advsimd_partial_mode_p): New predicate.
> > >   (aarch64_vectorize_vec_perm_const): Refuse for partial vector
> > > modes.
> > >   * config/aarch64/constraints.md (Da): New constraint.
> > >   * config/aarch64/iterators.md (VSUB64): New iterator.
> > >   (VALL_F16_SUB64): Likewise.
> > >   (size): Define attribute for sub-64-bit vector modes.
> > >   (VSC): New mode attribute.
> > >   (vstype): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/vect/complex/bb-slp-complex-add-half-float.c: Adjust
> > > testcase.
> > >   * gcc.dg/vect/complex/bb-slp-complex-mla-half-float.c: Likewise.
> > >   * gcc.dg/vect/complex/bb-slp-complex-mul-half-float.c: Likewise.
> > >   * gcc.target/aarch64/sve/slp_1.c: Likewise.
> > > ---
> > >  gcc/config/aarch64/aarch64-modes.def          |  4 +-
> > >  gcc/config/aarch64/aarch64-simd.md            | 64 ++++++++++++-
> > >  gcc/config/aarch64/aarch64.cc                 | 89 ++++++++++++-------
> > >  gcc/config/aarch64/constraints.md             |  5 ++
> > >  gcc/config/aarch64/iterators.md               | 19 +++-
> > >  .../complex/bb-slp-complex-add-half-float.c   |  2 +
> > >  .../complex/bb-slp-complex-mla-half-float.c   |  4 +-
> > >  .../complex/bb-slp-complex-mul-half-float.c   |  6 +-
> > >  gcc/testsuite/gcc.target/aarch64/sve/slp_1.c  | 11 +--
> > >  9 files changed, 157 insertions(+), 47 deletions(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-modes.def
> > > b/gcc/config/aarch64/aarch64-modes.def
> > > index d9bff61adec..d5a54689f7a 100644
> > > --- a/gcc/config/aarch64/aarch64-modes.def
> > > +++ b/gcc/config/aarch64/aarch64-modes.def
> > > @@ -79,8 +79,10 @@ VECTOR_MODES (FLOAT, 8);      /*                 V2SF. 
> > >  */
> > >  VECTOR_MODES (FLOAT, 16);     /*            V4SF V2DF.  */
> > >  VECTOR_MODE (INT, DI, 1);     /*                 V1DI.  */
> > >  VECTOR_MODE (FLOAT, DF, 1);   /*                 V1DF.  */
> > > -VECTOR_MODE (FLOAT, HF, 2);   /*                 V2HF.  */
> > >
> > > +VECTOR_MODES (INT, 2);        /*                 V2QI.  */
> > > +VECTOR_MODES (INT, 4);        /*            V4QI V2HI.  */
> > > +VECTOR_MODES (FLOAT, 4);      /*            V2BF V2HF.  */
> > >
> > >  /* Integer vector modes used to represent intermediate widened values in
> > > some
> > >     instructions.  Not intended to be moved to and from registers or
> memory.
> > > */
> > > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > > b/gcc/config/aarch64/aarch64-simd.md
> > > index c314e85927d..855b1ba353c 100644
> > > --- a/gcc/config/aarch64/aarch64-simd.md
> > > +++ b/gcc/config/aarch64/aarch64-simd.md
> > > @@ -49,8 +49,8 @@
> > >  (define_subst_attr "vczbe" "add_vec_concat_subst_be" ""
> > > "_vec_concatz_be")
> > >
> > >  (define_expand "mov<mode>"
> > > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > > - (match_operand:VALL_F16 1 "general_operand"))]
> > > +  [(set (match_operand:VALL_F16_SUB64 0 "nonimmediate_operand")
> > > + (match_operand:VALL_F16_SUB64 1 "general_operand"))]
> > >    "TARGET_FLOAT"
> > >    "
> > >    /* Force the operand into a register if it is not an
> > > @@ -77,7 +77,8 @@
> > >     aarch64_expand_vector_init (operands[0], operands[1]);
> > >     DONE;
> > >   }
> > > -      else if (!aarch64_simd_imm_zero (operands[1], <MODE>mode)
> > > +      else if (known_ge (GET_MODE_SIZE (<MODE>mode), 8)
> >
> > Use the helper?
> >
> > > +        && !aarch64_simd_imm_zero (operands[1], <MODE>mode)
> > >          && !aarch64_simd_special_constant_p (operands[1],
> > > <MODE>mode)
> > >          && !aarch64_simd_valid_mov_imm (operands[1]))
> > >   {
> > > @@ -241,6 +242,63 @@
> > >    }
> > >  )
> > >
> > > +(define_insn_and_split "*aarch64_simd_mov<mode>"
> > > +  [(set (match_operand:VSUB64 0 "nonimmediate_operand")
> > > + (match_operand:VSUB64 1 "general_operand"))]
> > > +  "TARGET_FLOAT
> > > +   && (register_operand (operands[0], <MODE>mode)
> > > +       || aarch64_simd_reg_or_zero (operands[1], <MODE>mode)
> > > +       || CONST_VECTOR_P (operands[1]))"
> > > +   {@ [cons: =0, 1; attrs: type, arch]
> > > +     [r , Dz ; mov_imm          , *    ] mov\t%w0, 0
> > > +     [r , rZ ; mov_reg          , *    ] mov\t%w0, %w1
> > > +     [r , Da ; mov_imm          , *    ] #
> > > +     [r , w  ; mov_reg          , simd ] #
> > > +     [r , m  ; load_4           , *    ] ldr<size>\t%w0, %1
> > > +     [w , w  ; neon_logic       , simd ] mov\t%0.8b, %1.8b
> > > +     [w , m  ; neon_load1_1reg  , simd ] ldr\t%<vstype>0, %1
> > > +     [w , Dz ; f_mcr            , *    ] fmov\t%<vstype>0, xzr
> > > +     [m , rZ ; store_4          , *    ] str<size>\t%w1, %0
> > > +     [m , w  ; neon_store1_1reg , simd ] str\t%<vstype>1, %0
> > > +  }
> > > +  "&& reload_completed
> > > +   && REG_P (operands[0])"
> > > +  [(const_int 0)]
> > > +  {
> > > +    if (CONST_VECTOR_P (operands[1]))
> > > +      {
> > > +       int elt_bitsize
> > > +  = GET_MODE_BITSIZE (GET_MODE_INNER (GET_MODE
> > > (operands[1])));
> > > +       int n_elts = CONST_VECTOR_NUNITS (operands[1]).to_constant ();
> > > +       int val = 0;
> > > +       bool int_vector_p = CONST_INT_P (CONST_VECTOR_ELT
> (operands[1],
> > > 0));
> > > +       unsigned HOST_WIDE_INT eltval;
> > > +       rtx elt;
> > > +       for (int i = 0; i < n_elts; i++)
> > > +  {
> > > +     elt = CONST_VECTOR_ELT (operands[1], BYTES_BIG_ENDIAN
> > > +                                          ? i
> > > +                                          : n_elts - 1 - i);
> > > +     if (int_vector_p)
> > > +      eltval = INTVAL (elt);
> > > +     else
> > > +      {
> > > +         bool res = aarch64_reinterpret_float_as_int (elt, &eltval);
> > > +         gcc_assert (res);
> > > +      }
> > > +
> > > +     val = (val << elt_bitsize) + (eltval & ((1 << elt_bitsize) - 1));
> > > +  }
> > > +       emit_move_insn (gen_rtx_REG (SImode, REGNO (operands[0])),
> > > +                GEN_INT (val));
> > > +      }
> > > +    else if (REG_P (operands[1]))
> > > +      aarch64_simd_emit_reg_reg_move (operands, <VSC>mode, 1);
> > > +    DONE;
> > > +  }
> > > +  [(set_attr "type" "mov_reg")]
> > > +)
> > > +
> > >  ;; When storing lane zero we can use the normal STR and its more
> permissive
> > >  ;; addressing modes.
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > > index 37c28c8f2f8..257c193fa64 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -1479,40 +1479,6 @@ pr_or_ffr_regnum_p (unsigned int regno)
> > >    return PR_REGNUM_P (regno) || regno == FFR_REGNUM || regno ==
> > > FFRT_REGNUM;
> > >  }
> > >
> > > -/* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.
> > > -   The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and
> > > -   GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS
> has a
> > > much
> > > -   higher cost.  POINTER_AND_FP_REGS is also used if the cost of both
> > > FP_REGS
> > > -   and GENERAL_REGS is lower than the memory cost (in this case the best
> > > class
> > > -   is the lowest cost one).  Using POINTER_AND_FP_REGS irrespectively of
> its
> > > -   cost results in bad allocations with many redundant int<->FP moves
> which
> > > -   are expensive on various cores.
> > > -   To avoid this we don't allow POINTER_AND_FP_REGS as the allocno
> class,
> > > but
> > > -   force a decision between FP_REGS and GENERAL_REGS.  We use the
> allocno
> > > class
> > > -   if it isn't POINTER_AND_FP_REGS.  Similarly, use the best class if it 
> > > isn't
> > > -   POINTER_AND_FP_REGS.  Otherwise set the allocno class depending on
> the
> > > mode.
> > > -   The result of this is that it is no longer inefficient to have a 
> > > higher
> > > -   memory move cost than the register move cost.
> > > -*/
> > > -
> > > -static reg_class_t
> > > -aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t
> > > allocno_class,
> > > -                                  reg_class_t best_class)
> > > -{
> > > -  machine_mode mode;
> > > -
> > > -  if (!reg_class_subset_p (GENERAL_REGS, allocno_class)
> > > -      || !reg_class_subset_p (FP_REGS, allocno_class))
> > > -    return allocno_class;
> > > -
> > > -  if (!reg_class_subset_p (GENERAL_REGS, best_class)
> > > -      || !reg_class_subset_p (FP_REGS, best_class))
> > > -    return best_class;
> > > -
> > > -  mode = PSEUDO_REGNO_MODE (regno);
> > > -  return FLOAT_MODE_P (mode) || VECTOR_MODE_P (mode) ? FP_REGS :
> > > GENERAL_REGS;
> > > -}
> > > -
> > >  static unsigned int
> > >  aarch64_min_divisions_for_recip_mul (machine_mode mode)
> > >  {
> > > @@ -1777,6 +1743,14 @@ aarch64_classify_vector_mode
> (machine_mode
> > > mode, bool any_target_p = false)
> > >      case E_V4x2DFmode:
> > >        return (TARGET_FLOAT || any_target_p) ? VEC_ADVSIMD |
> VEC_STRUCT :
> > > 0;
> > >
> > > +    /* 16-bit Advanced SIMD vectors.  */
> > > +    case E_V2QImode:
> > > +    /* 32-bit Advanced SIMD vectors.  */
> > > +    case E_V2HFmode:
> > > +    case E_V2BFmode:
> > > +    case E_V2HImode:
> > > +    case E_V4QImode:
> > > +      return (TARGET_FLOAT || any_target_p) ? VEC_ADVSIMD |
> VEC_PARTIAL
> > > : 0;
> > >      /* 64-bit Advanced SIMD vectors.  */
> > >      case E_V8QImode:
> > >      case E_V4HImode:
> > > @@ -1855,6 +1829,13 @@ aarch64_advsimd_full_struct_mode_p
> > > (machine_mode mode)
> > >    return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
> > > VEC_STRUCT));
> > >  }
> > >
> > > +/* Return true if MODE is a partial (sub-64-bit) Advanced SIMD mode.  */
> > > +static bool
> > > +aarch64_advsimd_partial_mode_p (machine_mode mode)
> > > +{
> > > +  return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD |
> > > VEC_PARTIAL));
> > > +}
> > > +
> > >  /* Return true if MODE is any of the data vector modes, including
> > >     structure modes.  */
> > >  static bool
> > > @@ -2126,6 +2107,43 @@ aarch64_coalesce_units (machine_mode
> > > vec_mode, unsigned int factor)
> > >    return {};
> > >  }
> > >
> > > +/* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.
> > > +   The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and
> > > +   GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS
> has a
> > > much
> > > +   higher cost.  POINTER_AND_FP_REGS is also used if the cost of both
> > > FP_REGS
> > > +   and GENERAL_REGS is lower than the memory cost (in this case the best
> > > class
> > > +   is the lowest cost one).  Using POINTER_AND_FP_REGS irrespectively of
> its
> > > +   cost results in bad allocations with many redundant int<->FP moves
> which
> > > +   are expensive on various cores.
> > > +   To avoid this we don't allow POINTER_AND_FP_REGS as the allocno
> class,
> > > but
> > > +   force a decision between FP_REGS and GENERAL_REGS.  We use the
> allocno
> > > class
> > > +   if it isn't POINTER_AND_FP_REGS.  Similarly, use the best class if it 
> > > isn't
> > > +   POINTER_AND_FP_REGS.  Otherwise set the allocno class depending on
> the
> > > mode.
> > > +   The result of this is that it is no longer inefficient to have a 
> > > higher
> > > +   memory move cost than the register move cost.
> > > +*/
> > > +
> > > +static reg_class_t
> > > +aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t
> > > allocno_class,
> > > +                                  reg_class_t best_class)
> > > +{
> > > +  machine_mode mode;
> > > +
> > > +  if (!reg_class_subset_p (GENERAL_REGS, allocno_class)
> > > +      || !reg_class_subset_p (FP_REGS, allocno_class))
> > > +    return allocno_class;
> > > +
> > > +  if (!reg_class_subset_p (GENERAL_REGS, best_class)
> > > +      || !reg_class_subset_p (FP_REGS, best_class))
> > > +    return best_class;
> > > +
> > > +  mode = PSEUDO_REGNO_MODE (regno);
> > > +  return FLOAT_MODE_P (mode) || (VECTOR_MODE_P (mode)
> > > +                          && (!INTEGRAL_MODE_P (mode)
> > > +                              || !aarch64_advsimd_partial_mode_p
> > > (mode)))
> > > +                         ? FP_REGS : GENERAL_REGS;
> > > +}
> > > +
> >
> > The condition seems a bit messy, aren't you effectively adding
> >
> > If (INTEGRAL_MODE_P (mode) && aarch64_advsimd_partial_mode_p
> (mode))
> >   Return GENERAL_REGS;
> 
> ... so let me know if you think we can keep this hunk, with the fix you're
> suggesting above.
> 
> Thanks for your review so far and looking forward to the rest of it.
> 
> Kind regards,
> Artemiy
> 
> >
> > Presumably so V2HF and V2BF are still FP_REGS.
> >
> > >  /* Implement TARGET_VECTORIZE_RELATED_MODE.  */
> > >
> > >  static opt_machine_mode
> > > @@ -28202,6 +28220,9 @@ aarch64_vectorize_vec_perm_const
> > > (machine_mode vmode, machine_mode op_mode,
> > >  {
> > >    struct expand_vec_perm_d d;
> > >
> > > +  if (aarch64_advsimd_partial_mode_p (op_mode))
> > > +    return false;
> > > +
> > >    /* Check whether the mask can be applied to a single vector.  */
> > >    if (sel.ninputs () == 1
> > >        || (op0 && rtx_equal_p (op0, op1)))
> > > diff --git a/gcc/config/aarch64/constraints.md
> > > b/gcc/config/aarch64/constraints.md
> > > index 3d166fe3a17..77eadc89819 100644
> > > --- a/gcc/config/aarch64/constraints.md
> > > +++ b/gcc/config/aarch64/constraints.md
> > > @@ -524,6 +524,11 @@
> > >   (and (match_code "const_int")
> > >        (match_test "aarch64_simd_scalar_immediate_valid_for_move (op,
> > >                                            QImode)")))
> > > +(define_constraint "Da"
> > > +  "@internal
> > > +  A constraint that matches all sub-64-bit vectors."
> > > +  (and (match_code "const_vector")
> > > +       (match_test "known_lt (GET_MODE_BITSIZE (mode), 64)")))
> > >
> >
> > Use the helper.
> >
> > Thanks,
> > Tamar
> >
> > >  (define_constraint "Dt"
> > >    "@internal
> > > diff --git a/gcc/config/aarch64/iterators.md
> > > b/gcc/config/aarch64/iterators.md
> > > index 39b1e84edcc..dfca3327f1f 100644
> > > --- a/gcc/config/aarch64/iterators.md
> > > +++ b/gcc/config/aarch64/iterators.md
> > > @@ -227,10 +227,17 @@
> > >  ;; All Advanced SIMD integer modes
> > >  (define_mode_iterator VALLI [VDQ_BHSI V2DI])
> > >
> > > +;; All sub-64-bit vector modes.
> > > +(define_mode_iterator VSUB64 [V2QI V4QI V2HI V2HF V2BF])
> > > +
> > >  ;; All Advanced SIMD modes suitable for moving, loading, and storing.
> > >  (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
> > >                           V4HF V8HF V4BF V8BF V2SF V4SF V2DF])
> > >
> > > +;; All Advanced SIMD modes suitable for moving, loading, and storing,
> > > +;; plus all sub-64-bit vector modes.
> > > +(define_mode_iterator VALL_F16_SUB64 [VALL_F16 VSUB64])
> > > +
> > >  ;; The VALL_F16 modes except the 128-bit 2-element ones.
> > >  (define_mode_iterator VALL_F16_NO_V2Q [V8QI V16QI V4HI V8HI V2SI
> > > V4SI
> > >                           V4HF V8HF V2SF V4SF])
> > > @@ -1466,7 +1473,9 @@
> > >  (define_mode_attr s [(HF "h") (SF "s") (DF "d") (SI "s") (DI "d")])
> > >
> > >  ;; Give the length suffix letter for a sign- or zero-extension.
> > > -(define_mode_attr size [(QI "b") (HI "h") (SI "w")])
> > > +(define_mode_attr size [(QI "b") (HI "h") (SI "w") (HF "") (BF "") (SF 
> > > "")
> > > +                 (V2QI "h") (V4QI "") (V2HI "")
> > > +                 (V2HF "") (V2BF "")])
> > >
> > >  ;; Give the number of bits in the mode
> > >  (define_mode_attr sizen [(QI "8") (HI "16") (SI "32") (DI "64")])
> > > @@ -1883,6 +1892,10 @@
> > >                   (VNx4SI  "v2si") (VNx4SF "v2sf")
> > >                   (VNx2DI  "di") (VNx2DF "df")])
> > >
> > > +;; Sub-64-bit vector mode to equivalent scalar mode.
> > > +(define_mode_attr VSC [(V4QI "SI") (V2QI "HI")
> > > +                (V2HI "SI") (V2HF "SF") (V2BF "SF")])
> > > +
> > >  (define_mode_attr vnx [(V4SI "vnx4si") (V2DI "vnx2di")])
> > >
> > >  ;; 64-bit container modes the inner or scalar source mode.
> > > @@ -2169,6 +2182,10 @@
> > >                           (V2SI "q") (V2SF "q")
> > >                           (DI   "q") (DF   "q")])
> > >
> > > +;; Scalar size of a sub-64-bit vector mode.
> > > +(define_mode_attr vstype [(V4QI "s") (V2QI "h")
> > > +                   (V2HI "s") (V2BF "s") (V2HF "s")])
> > > +
> > >  ;; Define corresponding core/FP element mode for each vector mode.
> > >  (define_mode_attr vw [(V8QI "w") (V16QI "w")
> > >                 (V4HI "w") (V8HI "w")
> > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-half-
> > > float.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-half-
> float.c
> > > index 3f1cce56955..6234f8646fe 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-half-float.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-half-float.c
> > > @@ -12,3 +12,5 @@
> > >
> > >  /* { dg-final { scan-tree-dump "add new stmt:
> > > \[^\n\r]*COMPLEX_ADD_ROT270" "slp1" { xfail *-*-* } } } */
> > >  /* { dg-final { scan-tree-dump "add new stmt:
> > > \[^\n\r]*COMPLEX_ADD_ROT90" "slp1" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-tree-dump "Found COMPLEX_ADD_ROT90" "slp1" } }
> */
> > > +/* { dg-final { scan-tree-dump "Found COMPLEX_ADD_ROT270" "slp1" } }
> */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-mla-half-
> float.c
> > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-mla-half-float.c
> > > index 33e500f3f4c..831f84bc1c8 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-mla-half-float.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-mla-half-float.c
> > > @@ -9,4 +9,6 @@
> > >  #include "complex-mla-template.c"
> > >
> > >  /* { dg-final { scan-tree-dump "Found COMPLEX_FMA_CONJ" "slp1" { xfail
> *-
> > > *-* } } } */
> > > -/* { dg-final { scan-tree-dump "Found COMPLEX_FMA" "slp1"  { xfail *-*-*
> } }
> > > } */
> > > +
> > > +/* { dg-final { scan-tree-dump-times "add new
> > > stmt:\[^\n\r]*COMPLEX_FMA" 1 "slp1" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-tree-dump "Found COMPLEX_FMA" "slp1" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-mul-half-
> > > float.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-mul-half-
> float.c
> > > index 259dd6b2e06..f74274ad034 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-mul-half-float.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-mul-half-float.c
> > > @@ -8,5 +8,7 @@
> > >  #define N 16
> > >  #include "complex-mul-template.c"
> > >
> > > -/* { dg-final { scan-tree-dump "Found COMPLEX_MUL_CONJ" "slp1"  {
> xfail *-
> > > *-* } } } */
> > > -/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "slp1"  { xfail *-*-*
> } }
> > > } */
> > > +/* { dg-final { scan-tree-dump-times "add new
> > > stmt:\[^\n\r]*COMPLEX_MUL_CONJ" 1 "slp1" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL_CONJ" "slp1" } } */
> > > +/* { dg-final { scan-tree-dump-times "add new
> > > stmt:\[^\n\r]*COMPLEX_MUL" 1 "slp1" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "slp1" } } */
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_1.c
> > > b/gcc/testsuite/gcc.target/aarch64/sve/slp_1.c
> > > index 07d71a63414..98e8ac3c785 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/sve/slp_1.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_1.c
> > > @@ -30,12 +30,14 @@ vec_slp_##TYPE (TYPE *restrict a, TYPE b, TYPE c,
> int
> > > n)        \
> > >  TEST_ALL (VEC_PERM)
> > >
> > >  /* We should use one DUP for each of the 8-, 16- and 32-bit types,
> > > -   although we currently use LD1RW for _Float16.  We should use two
> > > +   (for now, insert both elements with umov + ins for _Float16).  We 
> > > should
> > > use two
> > >     DUPs for each of the three 64-bit types.  */
> > >  /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, [hw]} 2 } } */
> > > -/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, [sw]} 2 } } */
> > > -/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 1 } } */
> > > +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, [sw]} 3 } } */
> > >  /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, [dx]} 9 } } */
> > > +/* { dg-final { scan-assembler-times {\tumov\tw[0-9]+, v[0-9]+\.h} 2 } }
> */
> > > +/* { dg-final { scan-assembler-times {\tins\tv[0-9]+\.h\[0\], w[0-9]+} 1 
> > > } }
> */
> > > +/* { dg-final { scan-assembler-times {\tins\tv[0-9]+\.h\[1\], w[0-9]+} 1 
> > > } }
> */
> > >  /* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, 
> > > z[0-
> > > 9]+\.d\n} 3 } } */
> > >  /* { dg-final { scan-assembler-not {\tzip2\t} } } */
> > >
> > > @@ -53,7 +55,6 @@ TEST_ALL (VEC_PERM)
> > >  /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */
> > >  /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
> > >  /* { dg-final { scan-assembler-not {\tldr} } } */
> > > -/* { dg-final { scan-assembler-times {\tstr} 2 } } */
> > > -/* { dg-final { scan-assembler-times {\tstr\th[0-9]+} 2 } } */
> > > +/* { dg-final { scan-assembler-not {\tstr} } } */
> > >
> > >  /* { dg-final { scan-assembler-not {\tuqdec} } } */
> > > --
> > > 2.43.0
> >

Reply via email to