Re: [PATCH] aarch64/simd: use ext instruction to combine two vector halves

Artemiy Volkov Sun, 19 Oct 2025 22:46:14 -0700

On Fri, Oct 17, 2025 at 02:15:08PM +0100, Tamar Christina wrote:
> I think this is better fixed in the middle-end. We have similar patterns for 
> this
> in match.pd. e.g.
> 
> /* Simplify vector inserts of other vector extracts to a permute.  */
> (simplify
>  (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
>  (if (VECTOR_TYPE_P (type)
>       && (VECTOR_MODE_P (TYPE_MODE (type))
>           || optimize_vectors_before_lowering_p ())
>       && operand_equal_p (TYPE_SIZE (TREE_TYPE (@0)),
>                           TYPE_SIZE (TREE_TYPE (@1)), 0)
>       && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
>       && TYPE_VECTOR_SUBPARTS (type).is_constant ()
>       && multiple_p (wi::to_poly_offset (@rpos),
>                      wi::to_poly_offset (TYPE_SIZE (TREE_TYPE (type)))))
> 
> I'd start by seeing if this pattern can't be extended.
> 
> We would recognize the EXT if the middle-end rewrote this to a
> VEC_PERM_EXPR.


Hi Tamar,

Thanks for your feedback. While I completely agree that fixing this in
middle-end is much better, I couldn't easily make it work with a match.pd
pattern, since vcombine produces a CONSTRUCTOR of a vector type, which
AFAICT cannot be on the left-hand side of a simplify expression.

Instead, I think the best place to address this is in
tree-ssa-forwprop.cc::simplify_vector_constructor (), which just needs to
be taught to handle subvector arguments to a constructor. I will
abandon this patch and create a new one, involving that function.

Thanks,
Artemiy

> 
> Thanks,
> Tamar
> 
> > 
> > This patch also includes a new test file to cover this transformation.
> > 
> > Bootstrapped and regtested on aarch64-linux-gnu, and additionally
> > regtested on aarch64_be-linux-gnu, no issues.
> > 
> > gcc/ChangeLog:
> > 
> >         * config/aarch64/aarch64-simd.md
> > (*aarch64_combine_high_low_internal<mode>):
> >         New insn.
> >         (*aarch64_combine_high_low_internal_be<mode>): Ditto.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >         * gcc.target/aarch64/simd/combine_ext.c: New test.
> > ---
> >  gcc/config/aarch64/aarch64-simd.md            | 27 +++++++++++
> >  .../gcc.target/aarch64/simd/combine_ext.c     | 47 +++++++++++++++++++
> >  2 files changed, 74 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/combine_ext.c
> > 
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 0d5b02a739f..309c5ad3e3d 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -4423,6 +4423,33 @@
> >    }
> >  )
> > 
> > +;; Combine high half of operand 1 (extracted with vec_select) with
> > +;; low half of operand 2.
> > +
> > +(define_insn "*aarch64_combine_high_low_internal<mode>"
> > +  [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand"
> > "=w")
> > +   (vec_concat:<VDBL>
> > +     (vec_select:VDC
> > +   (match_operand:<VDBL> 1 "register_operand" "w")
> > +   (match_operand:<VDBL> 3 "vect_par_cnst_hi_half"))
> > +     (match_operand:VDC 2 "register_operand" "w")))]
> > +  "TARGET_FLOAT && !BYTES_BIG_ENDIAN"
> > +  "ext\\t%0.16b, %1.16b, %2.16b, #8"
> > +  [(set_attr "type" "neon_ext<q>")]
> > +)
> > +
> > +(define_insn "*aarch64_combine_high_low_internal_be<mode>"
> > +  [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand"
> > "=w")
> > +   (vec_concat:<VDBL>
> > +     (match_operand:VDC 1 "register_operand" "w")
> > +      (vec_select:VDC
> > +   (match_operand:<VDBL> 2 "register_operand" "w")
> > +   (match_operand:<VDBL> 3 "vect_par_cnst_hi_half"))))]
> > +  "TARGET_FLOAT && BYTES_BIG_ENDIAN"
> > +  "ext\\t%0.16b, %1.16b, %2.16b, #8"
> > +  [(set_attr "type" "neon_ext<q>")]
> > +)
> > +
> >  ;; In this insn, operand 1 should be low, and operand 2 the high part of 
> > the
> >  ;; dest vector.
> > 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/combine_ext.c
> > b/gcc/testsuite/gcc.target/aarch64/simd/combine_ext.c
> > new file mode 100644
> > index 00000000000..27bcf310e19
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/combine_ext.c
> > @@ -0,0 +1,47 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1" } */
> > +
> > +#include <arm_neon.h>
> > +
> > +#ifndef TEST_COMBINE_HIGH_LOW_1
> > +#define TEST_COMBINE_HIGH_LOW_1(TYPE, SUFF)
> >     \
> > +  TYPE rev_##TYPE##_1 (TYPE x)
> >     \
> > +  {                                                                        
> > \
> > +    return vcombine_##SUFF (vget_high_##SUFF (x), vget_low_##SUFF (x)); \
> > +  }
> > +#endif
> > +
> > +#ifndef TEST_COMBINE_HIGH_LOW_2
> > +#define TEST_COMBINE_HIGH_LOW_2(TYPE, SUFF)
> >     \
> > +  TYPE rev_##TYPE##_2 (TYPE x, TYPE y)
> >     \
> > +  {                                                                        
> > \
> > +    return vcombine_##SUFF (vget_high_##SUFF (x), vget_low_##SUFF (y)); \
> > +  }
> > +#endif
> > +
> > +
> > +TEST_COMBINE_HIGH_LOW_1 (int8x16_t, s8)
> > +TEST_COMBINE_HIGH_LOW_1 (int16x8_t, s16)
> > +TEST_COMBINE_HIGH_LOW_1 (int32x4_t, s32)
> > +TEST_COMBINE_HIGH_LOW_1 (int64x2_t, s64)
> > +TEST_COMBINE_HIGH_LOW_1 (uint8x16_t, u8)
> > +TEST_COMBINE_HIGH_LOW_1 (uint16x8_t, u16)
> > +TEST_COMBINE_HIGH_LOW_1 (uint32x4_t, u32)
> > +TEST_COMBINE_HIGH_LOW_1 (uint64x2_t, u64)
> > +TEST_COMBINE_HIGH_LOW_1 (float16x8_t, f16)
> > +TEST_COMBINE_HIGH_LOW_1 (float32x4_t, f32)
> > +
> > +TEST_COMBINE_HIGH_LOW_2 (int8x16_t, s8)
> > +TEST_COMBINE_HIGH_LOW_2 (int16x8_t, s16)
> > +TEST_COMBINE_HIGH_LOW_2 (int32x4_t, s32)
> > +TEST_COMBINE_HIGH_LOW_2 (int64x2_t, s64)
> > +TEST_COMBINE_HIGH_LOW_2 (uint8x16_t, u8)
> > +TEST_COMBINE_HIGH_LOW_2 (uint16x8_t, u16)
> > +TEST_COMBINE_HIGH_LOW_2 (uint32x4_t, u32)
> > +TEST_COMBINE_HIGH_LOW_2 (uint64x2_t, u64)
> > +TEST_COMBINE_HIGH_LOW_2 (float16x8_t, f16)
> > +TEST_COMBINE_HIGH_LOW_2 (float32x4_t, f32)
> > +
> > +/* { dg-final { scan-assembler-times {ext\tv0.16b, v0.16b, v0.16b} 10 } } 
> > */
> > +/* { dg-final { scan-assembler-times {ext\tv0.16b, v0.16b, v1.16b} 10 {
> > target aarch64_little_endian } } } */
> > +/* { dg-final { scan-assembler-times {ext\tv0.16b, v1.16b, v0.16b} 10 {
> > target aarch64_big_endian } } } */
> > --
> > 2.43.0
>

Re: [PATCH] aarch64/simd: use ext instruction to combine two vector halves

Reply via email to