On Fri, Oct 17, 2025 at 02:15:08PM +0100, Tamar Christina wrote:
> I think this is better fixed in the middle-end. We have similar patterns for
> this
> in match.pd. e.g.
>
> /* Simplify vector inserts of other vector extracts to a permute. */
> (simplify
> (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
> (if (VECTOR_TYPE_P (type)
> && (VECTOR_MODE_P (TYPE_MODE (type))
> || optimize_vectors_before_lowering_p ())
> && operand_equal_p (TYPE_SIZE (TREE_TYPE (@0)),
> TYPE_SIZE (TREE_TYPE (@1)), 0)
> && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
> && TYPE_VECTOR_SUBPARTS (type).is_constant ()
> && multiple_p (wi::to_poly_offset (@rpos),
> wi::to_poly_offset (TYPE_SIZE (TREE_TYPE (type)))))
>
> I'd start by seeing if this pattern can't be extended.
>
> We would recognize the EXT if the middle-end rewrote this to a
> VEC_PERM_EXPR.
Hi Tamar,
Thanks for your feedback. While I completely agree that fixing this in
middle-end is much better, I couldn't easily make it work with a match.pd
pattern, since vcombine produces a CONSTRUCTOR of a vector type, which
AFAICT cannot be on the left-hand side of a simplify expression.
Instead, I think the best place to address this is in
tree-ssa-forwprop.cc::simplify_vector_constructor (), which just needs to
be taught to handle subvector arguments to a constructor. I will
abandon this patch and create a new one, involving that function.
Thanks,
Artemiy
>
> Thanks,
> Tamar
>
> >
> > This patch also includes a new test file to cover this transformation.
> >
> > Bootstrapped and regtested on aarch64-linux-gnu, and additionally
> > regtested on aarch64_be-linux-gnu, no issues.
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md
> > (*aarch64_combine_high_low_internal<mode>):
> > New insn.
> > (*aarch64_combine_high_low_internal_be<mode>): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/simd/combine_ext.c: New test.
> > ---
> > gcc/config/aarch64/aarch64-simd.md | 27 +++++++++++
> > .../gcc.target/aarch64/simd/combine_ext.c | 47 +++++++++++++++++++
> > 2 files changed, 74 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/combine_ext.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 0d5b02a739f..309c5ad3e3d 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -4423,6 +4423,33 @@
> > }
> > )
> >
> > +;; Combine high half of operand 1 (extracted with vec_select) with
> > +;; low half of operand 2.
> > +
> > +(define_insn "*aarch64_combine_high_low_internal<mode>"
> > + [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand"
> > "=w")
> > + (vec_concat:<VDBL>
> > + (vec_select:VDC
> > + (match_operand:<VDBL> 1 "register_operand" "w")
> > + (match_operand:<VDBL> 3 "vect_par_cnst_hi_half"))
> > + (match_operand:VDC 2 "register_operand" "w")))]
> > + "TARGET_FLOAT && !BYTES_BIG_ENDIAN"
> > + "ext\\t%0.16b, %1.16b, %2.16b, #8"
> > + [(set_attr "type" "neon_ext<q>")]
> > +)
> > +
> > +(define_insn "*aarch64_combine_high_low_internal_be<mode>"
> > + [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand"
> > "=w")
> > + (vec_concat:<VDBL>
> > + (match_operand:VDC 1 "register_operand" "w")
> > + (vec_select:VDC
> > + (match_operand:<VDBL> 2 "register_operand" "w")
> > + (match_operand:<VDBL> 3 "vect_par_cnst_hi_half"))))]
> > + "TARGET_FLOAT && BYTES_BIG_ENDIAN"
> > + "ext\\t%0.16b, %1.16b, %2.16b, #8"
> > + [(set_attr "type" "neon_ext<q>")]
> > +)
> > +
> > ;; In this insn, operand 1 should be low, and operand 2 the high part of
> > the
> > ;; dest vector.
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/combine_ext.c
> > b/gcc/testsuite/gcc.target/aarch64/simd/combine_ext.c
> > new file mode 100644
> > index 00000000000..27bcf310e19
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/combine_ext.c
> > @@ -0,0 +1,47 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1" } */
> > +
> > +#include <arm_neon.h>
> > +
> > +#ifndef TEST_COMBINE_HIGH_LOW_1
> > +#define TEST_COMBINE_HIGH_LOW_1(TYPE, SUFF)
> > \
> > + TYPE rev_##TYPE##_1 (TYPE x)
> > \
> > + {
> > \
> > + return vcombine_##SUFF (vget_high_##SUFF (x), vget_low_##SUFF (x)); \
> > + }
> > +#endif
> > +
> > +#ifndef TEST_COMBINE_HIGH_LOW_2
> > +#define TEST_COMBINE_HIGH_LOW_2(TYPE, SUFF)
> > \
> > + TYPE rev_##TYPE##_2 (TYPE x, TYPE y)
> > \
> > + {
> > \
> > + return vcombine_##SUFF (vget_high_##SUFF (x), vget_low_##SUFF (y)); \
> > + }
> > +#endif
> > +
> > +
> > +TEST_COMBINE_HIGH_LOW_1 (int8x16_t, s8)
> > +TEST_COMBINE_HIGH_LOW_1 (int16x8_t, s16)
> > +TEST_COMBINE_HIGH_LOW_1 (int32x4_t, s32)
> > +TEST_COMBINE_HIGH_LOW_1 (int64x2_t, s64)
> > +TEST_COMBINE_HIGH_LOW_1 (uint8x16_t, u8)
> > +TEST_COMBINE_HIGH_LOW_1 (uint16x8_t, u16)
> > +TEST_COMBINE_HIGH_LOW_1 (uint32x4_t, u32)
> > +TEST_COMBINE_HIGH_LOW_1 (uint64x2_t, u64)
> > +TEST_COMBINE_HIGH_LOW_1 (float16x8_t, f16)
> > +TEST_COMBINE_HIGH_LOW_1 (float32x4_t, f32)
> > +
> > +TEST_COMBINE_HIGH_LOW_2 (int8x16_t, s8)
> > +TEST_COMBINE_HIGH_LOW_2 (int16x8_t, s16)
> > +TEST_COMBINE_HIGH_LOW_2 (int32x4_t, s32)
> > +TEST_COMBINE_HIGH_LOW_2 (int64x2_t, s64)
> > +TEST_COMBINE_HIGH_LOW_2 (uint8x16_t, u8)
> > +TEST_COMBINE_HIGH_LOW_2 (uint16x8_t, u16)
> > +TEST_COMBINE_HIGH_LOW_2 (uint32x4_t, u32)
> > +TEST_COMBINE_HIGH_LOW_2 (uint64x2_t, u64)
> > +TEST_COMBINE_HIGH_LOW_2 (float16x8_t, f16)
> > +TEST_COMBINE_HIGH_LOW_2 (float32x4_t, f32)
> > +
> > +/* { dg-final { scan-assembler-times {ext\tv0.16b, v0.16b, v0.16b} 10 } }
> > */
> > +/* { dg-final { scan-assembler-times {ext\tv0.16b, v0.16b, v1.16b} 10 {
> > target aarch64_little_endian } } } */
> > +/* { dg-final { scan-assembler-times {ext\tv0.16b, v1.16b, v0.16b} 10 {
> > target aarch64_big_endian } } } */
> > --
> > 2.43.0
>