[PATCH v2 3/7] aarch64: Port NEON add intrinsics to pragma-based framework

Karl Meakin via Sourceware Forge Thu, 14 May 2026 05:45:15 -0700

From: Karl Meakin <[email protected]>

Add all the necessary infrastructure for defining NEON intrinsics using the 
pragma-based framework,
and port the `vadd` family of functions to demonstrate that it works.


gcc/ChangeLog:

        * config/aarch64/aarch64-neon-builtins-base.cc: New file.
        * config/aarch64/aarch64-neon-builtins-base.def: New file.
        * config/aarch64/aarch64-neon-builtins-base.h: New file.
        * config/aarch64/aarch64-neon-builtins-functions.h: New file.
        * config/aarch64/aarch64-neon-builtins-shapes.cc: New file.
        * config/aarch64/aarch64-neon-builtins-shapes.h: New file.
        * config/aarch64/aarch64-neon-builtins.cc: New file.
        * config/aarch64/aarch64-neon-builtins.def: New file.
        * config/aarch64/aarch64-neon-builtins.h: New file.
        * config.gcc (extra_headers, extra_objs): Add new files and reformat 
for readability.
        * config/aarch64/t-aarch64: Add recipes for new files.
        * config/aarch64/aarch64-protos.h (handle_arm_neon_h): Rename to 
`init_arm_neon_builtins`.
        * config/aarch64/aarch64-builtins.cc (handle_arm_neon_h): Likewise.
        * config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64): Call
        `aarch64_acle::handle_arm_neon_h`.
        * config/aarch64/aarch64-sve-builtins.cc (TYPES_*): Move to 
aarch64-acle-builtins.h
        (NONSTREAMING_SVE, SVE_AND_SME, SSVE): Likewise.
        * config/aarch64/aarch64-sve-builtins-shapes.cc (build_all): Remove 
`static` qualifier.
        (gimple_folder::fold): Allow folding when `TARGET_SVE` is false but 
`TARGET_SIMD` is true.
        * config/aarch64/aarch64-sve-builtins.def (p8, p16, p64, 128): New type 
suffixes.
        * config/aarch64/aarch64-acle-builtins.h (enum handle_pragma_index): 
New enum member
        `arm_neon_handle`.
        (enum type_class_index): New enum member `TYPE_poly`.
        (build_all): New declaration, so it can be used from 
`aarch64-neon-builtins.cc`.
        (TYPES_*): Moved from `aarch64-sve-builtins.cc`.
        (NONSTREAMING_SVE, SVE_AND_SME, SSVE): Likewise.
        * config/aarch64/arm_neon.h (vadd_s8, vadd_s16, vadd_s32, vadd_f32, 
vadd_f64, vadd_u8,
        vadd_u16, vadd_u32, vadd_s64, vadd_u64, vaddq_s8, vaddq_s16, vaddq_s32, 
vaddq_s64,
        vaddq_f32, vaddq_f64, vaddq_u8, vaddq_u16, vaddq_u32, vaddq_u64, 
vadd_f16, vaddq_f16,
        vadd_p8, vadd_p16, vadd_p64, vaddq_p8, vaddq_p16, vaddq_p64, 
vaddq_p128, vaddd_u64,
        vaddd_s64): Delete function definitions.

gcc/testsuite/ChangeLog:

        * g++.target/aarch64/pr103147-6.C: Fix tests.
        * g++.target/aarch64/pr117048.C: Fix tests.
        * gcc.target/aarch64/pr103147-6.c: Fix tests.
        * gcc.target/aarch64/neon/aarch64-neon.exp: New test.
        * gcc.target/aarch64/neon/arm_neon_test.h: New test.
        * gcc.target/aarch64/neon/vadd.c: New test.
---
 gcc/config.gcc                                |  20 +-
 gcc/config/aarch64/aarch64-acle-builtins.h    | 826 +++++++++++++++++
 gcc/config/aarch64/aarch64-builtins.cc        |  12 +-
 gcc/config/aarch64/aarch64-c.cc               |   3 +-
 .../aarch64/aarch64-neon-builtins-base.cc     | 113 +++
 .../aarch64/aarch64-neon-builtins-base.def    |  33 +
 .../aarch64/aarch64-neon-builtins-base.h      |  29 +
 .../aarch64/aarch64-neon-builtins-functions.h |  29 +
 .../aarch64/aarch64-neon-builtins-shapes.cc   |  69 ++
 .../aarch64/aarch64-neon-builtins-shapes.h    |  29 +
 gcc/config/aarch64/aarch64-neon-builtins.cc   |  86 ++
 gcc/config/aarch64/aarch64-neon-builtins.def  |  40 +
 gcc/config/aarch64/aarch64-neon-builtins.h    |  28 +
 gcc/config/aarch64/aarch64-protos.h           |   2 +-
 .../aarch64/aarch64-sve-builtins-shapes.cc    |  16 +-
 gcc/config/aarch64/aarch64-sve-builtins.cc    | 855 +-----------------
 gcc/config/aarch64/aarch64-sve-builtins.def   |  11 +
 gcc/config/aarch64/arm_neon.h                 | 204 -----
 gcc/config/aarch64/t-aarch64                  |  46 +
 gcc/testsuite/g++.target/aarch64/pr103147-6.C |   1 +
 gcc/testsuite/g++.target/aarch64/pr117048.C   |   2 +-
 .../gcc.target/aarch64/neon/aarch64-neon.exp  |  39 +
 .../gcc.target/aarch64/neon/arm_neon_test.h   |  22 +
 gcc/testsuite/gcc.target/aarch64/neon/vadd.c  | 203 +++++
 gcc/testsuite/gcc.target/aarch64/pr103147-6.c |   1 +
 25 files changed, 1688 insertions(+), 1031 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-neon-builtins-base.cc
 create mode 100644 gcc/config/aarch64/aarch64-neon-builtins-base.def
 create mode 100644 gcc/config/aarch64/aarch64-neon-builtins-base.h
 create mode 100644 gcc/config/aarch64/aarch64-neon-builtins-functions.h
 create mode 100644 gcc/config/aarch64/aarch64-neon-builtins-shapes.cc
 create mode 100644 gcc/config/aarch64/aarch64-neon-builtins-shapes.h
 create mode 100644 gcc/config/aarch64/aarch64-neon-builtins.cc
 create mode 100644 gcc/config/aarch64/aarch64-neon-builtins.def
 create mode 100644 gcc/config/aarch64/aarch64-neon-builtins.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/neon/aarch64-neon.exp
 create mode 100644 gcc/testsuite/gcc.target/aarch64/neon/arm_neon_test.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/neon/vadd.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 580a7fdee6b5..c125096a65cd 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -192,7 +192,7 @@
 #                      the --with-sysroot configure option or the
 #                      --sysroot command line option is used this
 #                      will be relative to the sysroot.
-# target_type_format_char 
+# target_type_format_char
 #                      The default character to be used for formatting
 #                      the attribute in a
 #                      .type symbol_name, ${t_t_f_c}<property>
@@ -361,7 +361,18 @@ cpu_is_64bit=
 case ${target} in
 aarch64*-*-*)
        cpu_type=aarch64
-       extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h 
arm_sme.h arm_neon_sve_bridge.h arm_private_fp8.h arm_private_neon_types.h"
+       extra_headers=(
+               'arm_fp16.h'
+               'arm_neon.h'
+               'arm_bf16.h'
+               'arm_acle.h'
+               'arm_sve.h'
+               'arm_sme.h'
+               'arm_neon_sve_bridge.h'
+               'arm_private_fp8.h'
+               'arm_private_neon_types.h'
+       )
+       extra_headers="${extra_headers[@]}"
        c_target_objs="aarch64-c.o"
        cxx_target_objs="aarch64-c.o"
        d_target_objs="aarch64-d.o"
@@ -382,6 +393,9 @@ aarch64*-*-*)
                'aarch64-json-tunings-printer.o'
                'aarch64-json-tunings-parser.o'
                'aarch64-narrow-gp-writes.o'
+               'aarch64-neon-builtins.o'
+               'aarch64-neon-builtins-base.o'
+               'aarch64-neon-builtins-shapes.o'
        )
        extra_objs="${extra_objs[@]}"
        target_gtfiles=(
@@ -390,6 +404,8 @@ aarch64*-*-*)
                '$(srcdir)/config/aarch64/aarch64-builtins.cc'
                '$(srcdir)/config/aarch64/aarch64-acle-builtins.h'
                '$(srcdir)/config/aarch64/aarch64-sve-builtins.cc'
+               '$(srcdir)/config/aarch64/aarch64-neon-builtins.cc'
+               '$(srcdir)/config/aarch64/aarch64-neon-builtins.h'
        )
        target_gtfiles="${target_gtfiles[@]}"
        target_has_targetm_common=yes
diff --git a/gcc/config/aarch64/aarch64-acle-builtins.h 
b/gcc/config/aarch64/aarch64-acle-builtins.h
index 20152aaea6a2..f0511456313e 100644
--- a/gcc/config/aarch64/aarch64-acle-builtins.h
+++ b/gcc/config/aarch64/aarch64-acle-builtins.h
@@ -129,6 +129,7 @@ enum units_index
 /* Enumerates the pragma handlers.  */
 enum handle_pragma_index
 {
+  arm_neon_handle,
   arm_sve_handle,
   arm_sme_handle,
   arm_neon_sve_handle,
@@ -187,6 +188,7 @@ enum type_class_index
   TYPE_mfloat,
   TYPE_signed,
   TYPE_unsigned,
+  TYPE_poly,
   NUM_TYPE_CLASSES
 };
 
@@ -1172,6 +1174,830 @@ function_expander::result_mode () const
   return TYPE_MODE (TREE_TYPE (TREE_TYPE (fndecl)));
 }
 
+void build_all (function_builder &b, const char *signature,
+               const function_group_info &group,
+               mode_suffix_index mode_suffix_id,
+               bool force_direct_overloads = false);
+
+/* Define a TYPES_<combination> macro for each combination of type
+   suffixes that an ACLE function can have, where <combination> is the
+   name used in DEF_SVE_FUNCTION entries.
+
+   Use S (T) for single type suffix T and D (T1, T2) for a pair of type
+   suffixes T1 and T2.  Use commas to separate the suffixes.
+
+   Although the order shouldn't matter, the convention is to sort the
+   suffixes lexicographically after dividing suffixes into a type
+   class ("b", "f", etc.) and a numerical bit count.  */
+
+/* _b8 _b16 _b32 _b64.  */
+#define TYPES_all_pred(S, D, T) \
+  S (b8), S (b16), S (b32), S (b64)
+
+/* _c8 _c16 _c32 _c64.  */
+#define TYPES_all_count(S, D, T) \
+  S (c8), S (c16), S (c32), S (c64)
+
+/* _b8 _b16 _b32 _b64
+   _c8 _c16 _c32 _c64.  */
+#define TYPES_all_pred_count(S, D, T) \
+  TYPES_all_pred (S, D, T), \
+  TYPES_all_count (S, D, T)
+
+/* _f16 _f32 _f64.  */
+#define TYPES_all_float(S, D, T) \
+  S (f16), S (f32), S (f64)
+
+/* _s8 _s16 _s32 _s64.  */
+#define TYPES_all_signed(S, D, T) \
+  S (s8), S (s16), S (s32), S (s64)
+
+/*     _f16 _f32 _f64
+   _s8 _s16 _s32 _s64.  */
+#define TYPES_all_float_and_signed(S, D, T) \
+  TYPES_all_float (S, D, T), TYPES_all_signed (S, D, T)
+
+/* _u8 _u16 _u32 _u64.  */
+#define TYPES_all_unsigned(S, D, T) \
+  S (u8), S (u16), S (u32), S (u64)
+
+/* _s8 _s16 _s32 _s64
+   _u8 _u16 _u32 _u64.  */
+#define TYPES_all_integer(S, D, T) \
+  TYPES_all_signed (S, D, T), TYPES_all_unsigned (S, D, T)
+
+/*     _f16 _f32 _f64
+   _s8 _s16 _s32 _s64
+   _u8 _u16 _u32 _u64.  */
+#define TYPES_all_arith(S, D, T) \
+  TYPES_all_float (S, D, T), TYPES_all_integer (S, D, T)
+
+/*         _f32 _f64
+   _s8 _s16 _s32 _s64
+   _u8 _u16 _u32 _u64.  */
+#define TYPES_all_arith_no_fp16(S, D, T) \
+  S (f32), S (f64), \
+  TYPES_all_integer (S, D, T)
+
+#define TYPES_all_data(S, D, T) \
+  TYPES_b_data (S, D, T), \
+  TYPES_h_data (S, D, T), \
+  TYPES_s_data (S, D, T), \
+  TYPES_d_data (S, D, T)
+
+/* _b only.  */
+#define TYPES_b(S, D, T) \
+  S (b)
+
+/* _c only.  */
+#define TYPES_c(S, D, T) \
+  S (c)
+
+/* _u8.  */
+#define TYPES_b_unsigned(S, D, T) \
+  S (u8)
+
+/* _s8
+   _u8.  */
+#define TYPES_b_integer(S, D, T) \
+  S (s8), TYPES_b_unsigned (S, D, T)
+
+/* _mf8
+   _s8
+   _u8.  */
+#define TYPES_b_data(S, D, T) \
+  S (mf8), TYPES_b_integer (S, D, T)
+
+/* _s8 _s16
+   _u8 _u16.  */
+#define TYPES_bh_integer(S, D, T) \
+  S (s8), S (s16), S (u8), S (u16)
+
+/* _u8 _u32.  */
+#define TYPES_bs_unsigned(S, D, T) \
+  S (u8), S (u32)
+
+/* _s8 _s16 _s32.  */
+#define TYPES_bhs_signed(S, D, T) \
+  S (s8), S (s16), S (s32)
+
+/* _u8 _u16 _u32.  */
+#define TYPES_bhs_unsigned(S, D, T) \
+  S (u8), S (u16), S (u32)
+
+/* _s8 _s16 _s32
+   _u8 _u16 _u32.  */
+#define TYPES_bhs_integer(S, D, T) \
+  TYPES_bhs_signed (S, D, T), TYPES_bhs_unsigned (S, D, T)
+
+#define TYPES_bh_data(S, D, T)                 \
+  TYPES_b_data (S, D, T), \
+  TYPES_h_data (S, D, T)
+
+#define TYPES_bhs_data(S, D, T)                        \
+  TYPES_b_data (S, D, T), \
+  TYPES_h_data (S, D, T), \
+  TYPES_s_data (S, D, T)
+
+/* _s16_s8  _s32_s16  _s64_s32
+   _u16_u8  _u32_u16  _u64_u32.  */
+#define TYPES_bhs_widen(S, D, T) \
+  D (s16, s8), D (s32, s16), D (s64, s32), \
+  D (u16, u8), D (u32, u16), D (u64, u32)
+
+/* _bf16.  */
+#define TYPES_h_bfloat(S, D, T) \
+  S (bf16)
+
+/* _f16.  */
+#define TYPES_h_float(S, D, T) \
+  S (f16)
+
+/* _s16
+   _u16.  */
+#define TYPES_h_integer(S, D, T) \
+  S (s16), S (u16)
+
+/* _bf16
+   _f16
+   _s16
+   _u16.  */
+#define TYPES_h_data(S, D, T) \
+  S (bf16), S (f16), TYPES_h_integer (S, D, T)
+
+/* _s16 _s32.  */
+#define TYPES_hs_signed(S, D, T) \
+  S (s16), S (s32)
+
+/* _s16 _s32
+   _u16 _u32.  */
+#define TYPES_hs_integer(S, D, T) \
+  TYPES_hs_signed (S, D, T), S (u16), S (u32)
+
+/* _f16 _f32.  */
+#define TYPES_hs_float(S, D, T) \
+  S (f16), S (f32)
+
+#define TYPES_hs_data(S, D, T) \
+  TYPES_h_data (S, D, T), \
+  TYPES_s_data (S, D, T)
+
+/* _u16 _u64.  */
+#define TYPES_hd_unsigned(S, D, T) \
+  S (u16), S (u64)
+
+/* _s16 _s32 _s64.  */
+#define TYPES_hsd_signed(S, D, T) \
+  S (s16), S (s32), S (s64)
+
+/* _s16 _s32 _s64
+   _u16 _u32 _u64.  */
+#define TYPES_hsd_integer(S, D, T) \
+  TYPES_hsd_signed (S, D, T), S (u16), S (u32), S (u64)
+
+#define TYPES_hsd_data(S, D, T) \
+  TYPES_h_data (S, D, T), \
+  TYPES_s_data (S, D, T), \
+  TYPES_d_data (S, D, T)
+
+/* _f16_mf8.  */
+#define TYPES_h_float_mf8(S, D, T) \
+  D (f16, mf8)
+
+/* _f32.  */
+#define TYPES_s_float(S, D, T) \
+  S (f32)
+
+/* _f32_mf8.  */
+#define TYPES_s_float_mf8(S, D, T) \
+  D (f32, mf8)
+
+/*      _f32
+   _s16 _s32 _s64
+   _u16 _u32 _u64.  */
+#define TYPES_s_float_hsd_integer(S, D, T) \
+  TYPES_s_float (S, D, T), TYPES_hsd_integer (S, D, T)
+
+/* _f32
+   _s32 _s64
+   _u32 _u64.  */
+#define TYPES_s_float_sd_integer(S, D, T) \
+  TYPES_s_float (S, D, T), TYPES_sd_integer (S, D, T)
+
+/* _s32.  */
+#define TYPES_s_signed(S, D, T) \
+  S (s32)
+
+/* _u32.  */
+#define TYPES_s_unsigned(S, D, T) \
+  S (u32)
+
+/* _s32
+   _u32.  */
+#define TYPES_s_integer(S, D, T) \
+  TYPES_s_signed (S, D, T), TYPES_s_unsigned (S, D, T)
+
+/* _f32
+   _s32
+   _u32.  */
+#define TYPES_s_data(S, D, T) \
+  TYPES_s_float (S, D, T), TYPES_s_integer (S, D, T)
+
+/* _s32 _s64.  */
+#define TYPES_sd_signed(S, D, T) \
+  S (s32), S (s64)
+
+/* _u32 _u64.  */
+#define TYPES_sd_unsigned(S, D, T) \
+  S (u32), S (u64)
+
+/* _s32 _s64
+   _u32 _u64.  */
+#define TYPES_sd_integer(S, D, T) \
+  TYPES_sd_signed (S, D, T), TYPES_sd_unsigned (S, D, T)
+
+#define TYPES_sd_data(S, D, T) \
+  TYPES_s_data (S, D, T), \
+  TYPES_d_data (S, D, T)
+
+/* _f16 _f32 _f64
+       _s32 _s64
+       _u32 _u64.  */
+#define TYPES_all_float_and_sd_integer(S, D, T) \
+  TYPES_all_float (S, D, T), TYPES_sd_integer (S, D, T)
+
+/* _f64.  */
+#define TYPES_d_float(S, D, T) \
+  S (f64)
+
+/* _u64.  */
+#define TYPES_d_unsigned(S, D, T) \
+  S (u64)
+
+/* _s64
+   _u64.  */
+#define TYPES_d_integer(S, D, T) \
+  S (s64), TYPES_d_unsigned (S, D, T)
+
+/* _f64
+   _s64
+   _u64.  */
+#define TYPES_d_data(S, D, T) \
+  TYPES_d_float (S, D, T), TYPES_d_integer (S, D, T)
+
+/* All the type combinations allowed by svcvt.  */
+#define TYPES_cvt(S, D, T) \
+  D (f16, f32), D (f16, f64), \
+  D (f16, s16), D (f16, s32), D (f16, s64), \
+  D (f16, u16), D (f16, u32), D (f16, u64), \
+  \
+  D (f32, f16), D (f32, f64), \
+  D (f32, s32), D (f32, s64), \
+  D (f32, u32), D (f32, u64), \
+  \
+  D (f64, f16), D (f64, f32), \
+  D (f64, s32), D (f64, s64), \
+  D (f64, u32), D (f64, u64), \
+  \
+  D (s16, f16), \
+  D (s32, f16), D (s32, f32), D (s32, f64), \
+  D (s64, f16), D (s64, f32), D (s64, f64), \
+  \
+  D (u16, f16), \
+  D (u32, f16), D (u32, f32), D (u32, f64), \
+  D (u64, f16), D (u64, f32), D (u64, f64)
+
+/* _bf16_f32.  */
+#define TYPES_cvt_bfloat(S, D, T) \
+  D (bf16, f32)
+
+/* { _bf16 _f16 } x _f32.  */
+#define TYPES_cvt_h_s_float(S, D, T) \
+  D (bf16, f32), D (f16, f32)
+
+/* _f32_f16.  */
+#define TYPES_cvt_f32_f16(S, D, T) \
+  D (f32, f16)
+
+/* _f32_f16
+   _f64_f32.  */
+#define TYPES_cvt_long(S, D, T) \
+  D (f32, f16), D (f64, f32)
+
+/* _f32_f64.  */
+#define TYPES_cvt_narrow_s(S, D, T) \
+  D (f32, f64)
+
+/* _f16_f32
+   _f32_f64.  */
+#define TYPES_cvt_narrow(S, D, T) \
+  D (f16, f32), TYPES_cvt_narrow_s (S, D, T)
+
+/* { _s32 _u32 } x _f32
+
+   _f32 x { _s32 _u32 }.  */
+#define TYPES_cvt_s_s(S, D, T) \
+  D (s32, f32), \
+  D (u32, f32), \
+  D (f32, s32), \
+  D (f32, u32)
+
+/* _f16_mf8
+   _bf16_mf8.  */
+#define TYPES_cvt_mf8(S, D, T) \
+  D (f16, mf8), D (bf16, mf8)
+
+/* _mf8_f16
+   _mf8_bf16.  */
+#define TYPES_cvtn_mf8(S, D, T) \
+  D (mf8, f16), D (mf8, bf16)
+
+/* _mf8_f32.  */
+#define TYPES_cvtnx_mf8(S, D, T) \
+  D (mf8, f32)
+
+/* { _s32 _s64 } x { _b8 _b16 _b32 _b64 }
+   { _u32 _u64 }.  */
+#define TYPES_inc_dec_n1(D, A) \
+  D (A, b8), D (A, b16), D (A, b32), D (A, b64)
+#define TYPES_inc_dec_n(S, D, T) \
+  TYPES_inc_dec_n1 (D, s32), \
+  TYPES_inc_dec_n1 (D, s64), \
+  TYPES_inc_dec_n1 (D, u32), \
+  TYPES_inc_dec_n1 (D, u64)
+
+/* { _s16 _u16 } x _s32
+
+   {      _u16 } x _u32.  */
+#define TYPES_qcvt_x2(S, D, T) \
+  D (s16, s32), \
+  D (u16, u32), \
+  D (u16, s32)
+
+/* { _s8  _u8  } x _s32
+
+   {      _u8  } x _u32
+
+   { _s16 _u16 } x _s64
+
+   {      _u16 } x _u64.  */
+#define TYPES_qcvt_x4(S, D, T) \
+  D (s8, s32), \
+  D (u8, u32), \
+  D (u8, s32), \
+  D (s16, s64), \
+  D (u16, u64), \
+  D (u16, s64)
+
+/* _s16_s32
+   _u16_u32.  */
+#define TYPES_qrshr_x2(S, D, T) \
+  D (s16, s32), \
+  D (u16, u32)
+
+/* _u16_s32.  */
+#define TYPES_qrshru_x2(S, D, T) \
+  D (u16, s32)
+
+/* _s8_s32
+   _s16_s64
+   _u8_u32
+   _u16_u64.  */
+#define TYPES_qrshr_x4(S, D, T) \
+  D (s8, s32), \
+  D (s16, s64), \
+  D (u8, u32), \
+  D (u16, u64)
+
+/* _u8_s32
+   _u16_s64.  */
+#define TYPES_qrshru_x4(S, D, T) \
+  D (u8, s32), \
+  D (u16, s64)
+
+/* { _mf8 _bf16                 }   { _mf8 _bf16          }
+   {      _f16 _f32 _f64 }   {      _f16 _f32 _f64 }
+   { _s8  _s16 _s32 _s64 } x { _s8  _s16 _s32 _s64 }
+   { _u8  _u16 _u32 _u64 }   { _u8  _u16 _u32 _u64 }.  */
+#define TYPES_reinterpret1(D, A) \
+  D (A, mf8), \
+  D (A, bf16), \
+  D (A, f16), D (A, f32), D (A, f64), \
+  D (A, s8), D (A, s16), D (A, s32), D (A, s64), \
+  D (A, u8), D (A, u16), D (A, u32), D (A, u64)
+#define TYPES_reinterpret(S, D, T) \
+  TYPES_reinterpret1 (D, mf8), \
+  TYPES_reinterpret1 (D, bf16), \
+  TYPES_reinterpret1 (D, f16), \
+  TYPES_reinterpret1 (D, f32), \
+  TYPES_reinterpret1 (D, f64), \
+  TYPES_reinterpret1 (D, s8), \
+  TYPES_reinterpret1 (D, s16), \
+  TYPES_reinterpret1 (D, s32), \
+  TYPES_reinterpret1 (D, s64), \
+  TYPES_reinterpret1 (D, u8), \
+  TYPES_reinterpret1 (D, u16), \
+  TYPES_reinterpret1 (D, u32), \
+  TYPES_reinterpret1 (D, u64)
+
+/* _b_c
+   _c_b.  */
+#define TYPES_reinterpret_b(S, D, T) \
+  D (b, c), \
+  D (c, b)
+
+/* { _b8 _b16 _b32 _b64 } x { _s32 _s64 }
+                           { _u32 _u64 } */
+#define TYPES_while1(D, bn) \
+  D (bn, s32), D (bn, s64), D (bn, u32), D (bn, u64)
+#define TYPES_while(S, D, T) \
+  TYPES_while1 (D, b8), \
+  TYPES_while1 (D, b16), \
+  TYPES_while1 (D, b32), \
+  TYPES_while1 (D, b64)
+
+/* { _b8 _b16 _b32 _b64 } x { _s64 }
+                           { _u64 } */
+#define TYPES_while_x(S, D, T) \
+  D (b8, s64), D (b8, u64), \
+  D (b16, s64), D (b16, u64), \
+  D (b32, s64), D (b32, u64), \
+  D (b64, s64), D (b64, u64)
+
+/* { _c8 _c16 _c32 _c64 } x { _s64 }
+                           { _u64 } */
+#define TYPES_while_x_c(S, D, T) \
+  D (c8, s64), D (c8, u64), \
+  D (c16, s64), D (c16, u64), \
+  D (c32, s64), D (c32, u64), \
+  D (c64, s64), D (c64, u64)
+
+/* _f32_f16
+   _s32_s16
+   _u32_u16.  */
+#define TYPES_s_narrow_fsu(S, D, T) \
+  D (f32, f16), D (s32, s16), D (u32, u16)
+
+/* _za8 _za16 _za32 _za64 _za128.  */
+#define TYPES_all_za(S, D, T) \
+  S (za8), S (za16), S (za32), S (za64), S (za128)
+
+/* _za64.  */
+#define TYPES_d_za(S, D, T) \
+  S (za64)
+
+/* {   _za8 } x {  _mf8       _s8  _u8 }
+   {  _za16 } x { _bf16 _f16 _s16 _u16 }
+   {  _za32 } x {       _f32 _s32 _u32 }
+   {  _za64 } x {       _f64 _s64 _u64 }.  */
+#define TYPES_za_bhsd_data(S, D, T) \
+  D (za8, mf8), D (za8, s8), D (za8, u8), \
+  D (za16, bf16), D (za16, f16), D (za16, s16), D (za16, u16), \
+  D (za32, f32), D (za32, s32), D (za32, u32), \
+  D (za64, f64), D (za64, s64), D (za64, u64)
+
+/* Likewise, plus:
+
+   { _za128 } x {      _bf16          }
+               {       _f16 _f32 _f64 }
+               { _s8   _s16 _s32 _s64 }
+               { _u8   _u16 _u32 _u64 }.  */
+
+#define TYPES_za_all_data(S, D, T) \
+  TYPES_za_bhsd_data (S, D, T), \
+  TYPES_reinterpret1 (D, za128)
+
+/* _za16_mf8.  */
+#define TYPES_za_h_mf8(S, D, T) \
+  D (za16, mf8)
+
+/* { _za_16 _za_32 } x _mf8.  */
+#define TYPES_za_hs_mf8(S, D, T) \
+  D (za16, mf8), D (za32, mf8)
+
+/* _za16_bf16.  */
+#define TYPES_za_h_bfloat(S, D, T) \
+  D (za16, bf16)
+
+/* _za16_f16.  */
+#define TYPES_za_h_float(S, D, T) \
+  D (za16, f16)
+
+/* _za32_s8.  */
+#define TYPES_za_s_b_signed(S, D, T) \
+   D (za32, s8)
+
+/* _za32_u8.  */
+#define TYPES_za_s_b_unsigned(S, D, T) \
+   D (za32, u8)
+
+/* _za32 x { _s8 _u8 }.  */
+#define TYPES_za_s_b_integer(S, D, T) \
+  D (za32, s8), D (za32, u8)
+
+/* _za32 x { _s16 _u16 }.  */
+#define TYPES_za_s_h_integer(S, D, T) \
+  D (za32, s16), D (za32, u16)
+
+/* _za32 x { _bf16 _f16 _s16 _u16 }.  */
+#define TYPES_za_s_h_data(S, D, T) \
+  D (za32, bf16), D (za32, f16), D (za32, s16), D (za32, u16)
+
+/* _za32_u32.  */
+#define TYPES_za_s_unsigned(S, D, T) \
+  D (za32, u32)
+
+/* _za32 x { _s32 _u32 }.  */
+#define TYPES_za_s_integer(S, D, T) \
+  D (za32, s32), D (za32, u32)
+
+/* _za32_mf8.  */
+#define TYPES_za_s_mf8(S, D, T) \
+  D (za32, mf8)
+
+/* _za32_f32.  */
+#define TYPES_za_s_float(S, D, T) \
+  D (za32, f32)
+
+/* _za32 x { _f32 _s32 _u32 }.  */
+#define TYPES_za_s_data(S, D, T) \
+  D (za32, f32), D (za32, s32), D (za32, u32)
+
+/* _za64 x { _s16 _u16 }.  */
+#define TYPES_za_d_h_integer(S, D, T) \
+  D (za64, s16), D (za64, u16)
+
+/* _za64_f64.  */
+#define TYPES_za_d_float(S, D, T) \
+  D (za64, f64)
+
+/* _za64 x { _s64 _u64 }.  */
+#define TYPES_za_d_integer(S, D, T) \
+  D (za64, s64), D (za64, u64)
+
+/* _za32 x { _s8 _u8 _bf16 _f16 _f32 }.  */
+#define TYPES_mop_base(S, D, T) \
+  D (za32, s8), D (za32, u8), D (za32, bf16), D (za32, f16), D (za32, f32)
+
+/* _za32_s8.  */
+#define TYPES_mop_base_signed(S, D, T) \
+  D (za32, s8)
+
+/* _za32_u8.  */
+#define TYPES_mop_base_unsigned(S, D, T) \
+  D (za32, u8)
+
+/* _za64 x { _s16 _u16 }.  */
+#define TYPES_mop_i16i64(S, D, T) \
+  D (za64, s16), D (za64, u16)
+
+/* _za64_s16.  */
+#define TYPES_mop_i16i64_signed(S, D, T) \
+  D (za64, s16)
+
+/* _za64_u16.  */
+#define TYPES_mop_i16i64_unsigned(S, D, T) \
+  D (za64, u16)
+
+/* _za.  */
+#define TYPES_za(S, D, T) \
+  S (za)
+
+/* _p8 _p16 _p64.  */
+#define TYPES_bhd_poly(S, D, T) \
+  S (p8), S (p16), S (p64)
+
+/* _p8 _p16 _p64 _p128.  */
+#define TYPES_bhdq_poly(S, D, T) \
+  S (p8), S (p16), S (p64), S (p128)
+
+/* Describe a tuple of type suffixes in which only the first is used.  */
+#define DEF_VECTOR_TYPE(X) \
+  { TYPE_SUFFIX_ ## X, NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES }
+
+/* Describe a tuple of type suffixes in which only the first two are used.  */
+#define DEF_DOUBLE_TYPE(X, Y) \
+  { TYPE_SUFFIX_ ## X, TYPE_SUFFIX_ ## Y, NUM_TYPE_SUFFIXES }
+
+/* Describe a tuple of type suffixes in which three elements are used.  */
+#define DEF_TRIPLE_TYPE(X, Y, Z) \
+  { TYPE_SUFFIX_ ## X, TYPE_SUFFIX_ ## Y, TYPE_SUFFIX_ ## Z }
+
+/* Create an array that can be used in aarch64-sve-builtins.def to
+   select the type suffixes in TYPES_<NAME>.  */
+#define DEF_SVE_TYPES_ARRAY(NAME) \
+  static const type_suffix_triple types_##NAME[] = { \
+    TYPES_##NAME (DEF_VECTOR_TYPE, DEF_DOUBLE_TYPE, DEF_TRIPLE_TYPE), \
+    { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES } \
+  }
+
+/* For functions that don't take any type suffixes.  */
+static const type_suffix_triple types_none[] = {
+  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES },
+  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES }
+};
+
+/* Create an array for each TYPES_<combination> macro above.  */
+DEF_SVE_TYPES_ARRAY (all_pred);
+DEF_SVE_TYPES_ARRAY (all_count);
+DEF_SVE_TYPES_ARRAY (all_pred_count);
+DEF_SVE_TYPES_ARRAY (all_float);
+DEF_SVE_TYPES_ARRAY (all_signed);
+DEF_SVE_TYPES_ARRAY (all_float_and_signed);
+DEF_SVE_TYPES_ARRAY (all_unsigned);
+DEF_SVE_TYPES_ARRAY (all_integer);
+DEF_SVE_TYPES_ARRAY (all_arith);
+DEF_SVE_TYPES_ARRAY (all_arith_no_fp16);
+DEF_SVE_TYPES_ARRAY (all_data);
+DEF_SVE_TYPES_ARRAY (b);
+DEF_SVE_TYPES_ARRAY (b_unsigned);
+DEF_SVE_TYPES_ARRAY (b_integer);
+DEF_SVE_TYPES_ARRAY (bh_integer);
+DEF_SVE_TYPES_ARRAY (bs_unsigned);
+DEF_SVE_TYPES_ARRAY (bhs_signed);
+DEF_SVE_TYPES_ARRAY (bhs_unsigned);
+DEF_SVE_TYPES_ARRAY (bhs_integer);
+DEF_SVE_TYPES_ARRAY (bh_data);
+DEF_SVE_TYPES_ARRAY (bhs_data);
+DEF_SVE_TYPES_ARRAY (bhs_widen);
+DEF_SVE_TYPES_ARRAY (c);
+DEF_SVE_TYPES_ARRAY (h_bfloat);
+DEF_SVE_TYPES_ARRAY (h_float);
+DEF_SVE_TYPES_ARRAY (h_float_mf8);
+DEF_SVE_TYPES_ARRAY (h_integer);
+DEF_SVE_TYPES_ARRAY (h_data);
+DEF_SVE_TYPES_ARRAY (hs_signed);
+DEF_SVE_TYPES_ARRAY (hs_integer);
+DEF_SVE_TYPES_ARRAY (hs_float);
+DEF_SVE_TYPES_ARRAY (hs_data);
+DEF_SVE_TYPES_ARRAY (hd_unsigned);
+DEF_SVE_TYPES_ARRAY (hsd_signed);
+DEF_SVE_TYPES_ARRAY (hsd_integer);
+DEF_SVE_TYPES_ARRAY (hsd_data);
+DEF_SVE_TYPES_ARRAY (s_float);
+DEF_SVE_TYPES_ARRAY (s_float_hsd_integer);
+DEF_SVE_TYPES_ARRAY (s_float_mf8);
+DEF_SVE_TYPES_ARRAY (s_float_sd_integer);
+DEF_SVE_TYPES_ARRAY (s_signed);
+DEF_SVE_TYPES_ARRAY (s_unsigned);
+DEF_SVE_TYPES_ARRAY (s_integer);
+DEF_SVE_TYPES_ARRAY (s_data);
+DEF_SVE_TYPES_ARRAY (sd_signed);
+DEF_SVE_TYPES_ARRAY (sd_unsigned);
+DEF_SVE_TYPES_ARRAY (sd_integer);
+DEF_SVE_TYPES_ARRAY (sd_data);
+DEF_SVE_TYPES_ARRAY (all_float_and_sd_integer);
+DEF_SVE_TYPES_ARRAY (d_float);
+DEF_SVE_TYPES_ARRAY (d_unsigned);
+DEF_SVE_TYPES_ARRAY (d_integer);
+DEF_SVE_TYPES_ARRAY (d_data);
+DEF_SVE_TYPES_ARRAY (cvt);
+DEF_SVE_TYPES_ARRAY (cvt_bfloat);
+DEF_SVE_TYPES_ARRAY (cvt_h_s_float);
+DEF_SVE_TYPES_ARRAY (cvt_f32_f16);
+DEF_SVE_TYPES_ARRAY (cvt_long);
+DEF_SVE_TYPES_ARRAY (cvt_mf8);
+DEF_SVE_TYPES_ARRAY (cvt_narrow_s);
+DEF_SVE_TYPES_ARRAY (cvt_narrow);
+DEF_SVE_TYPES_ARRAY (cvt_s_s);
+DEF_SVE_TYPES_ARRAY (cvtn_mf8);
+DEF_SVE_TYPES_ARRAY (cvtnx_mf8);
+DEF_SVE_TYPES_ARRAY (inc_dec_n);
+DEF_SVE_TYPES_ARRAY (qcvt_x2);
+DEF_SVE_TYPES_ARRAY (qcvt_x4);
+DEF_SVE_TYPES_ARRAY (qrshr_x2);
+DEF_SVE_TYPES_ARRAY (qrshr_x4);
+DEF_SVE_TYPES_ARRAY (qrshru_x2);
+DEF_SVE_TYPES_ARRAY (qrshru_x4);
+DEF_SVE_TYPES_ARRAY (reinterpret);
+DEF_SVE_TYPES_ARRAY (reinterpret_b);
+DEF_SVE_TYPES_ARRAY (while);
+DEF_SVE_TYPES_ARRAY (while_x);
+DEF_SVE_TYPES_ARRAY (while_x_c);
+DEF_SVE_TYPES_ARRAY (s_narrow_fsu);
+DEF_SVE_TYPES_ARRAY (all_za);
+DEF_SVE_TYPES_ARRAY (d_za);
+DEF_SVE_TYPES_ARRAY (za_bhsd_data);
+DEF_SVE_TYPES_ARRAY (za_all_data);
+DEF_SVE_TYPES_ARRAY (za_h_mf8);
+DEF_SVE_TYPES_ARRAY (za_h_bfloat);
+DEF_SVE_TYPES_ARRAY (za_h_float);
+DEF_SVE_TYPES_ARRAY (za_s_b_signed);
+DEF_SVE_TYPES_ARRAY (za_s_b_unsigned);
+DEF_SVE_TYPES_ARRAY (za_s_b_integer);
+DEF_SVE_TYPES_ARRAY (za_s_h_integer);
+DEF_SVE_TYPES_ARRAY (za_s_h_data);
+DEF_SVE_TYPES_ARRAY (za_s_unsigned);
+DEF_SVE_TYPES_ARRAY (za_s_integer);
+DEF_SVE_TYPES_ARRAY (za_s_mf8);
+DEF_SVE_TYPES_ARRAY (za_hs_mf8);
+DEF_SVE_TYPES_ARRAY (za_s_float);
+DEF_SVE_TYPES_ARRAY (za_s_data);
+DEF_SVE_TYPES_ARRAY (za_d_h_integer);
+DEF_SVE_TYPES_ARRAY (za_d_float);
+DEF_SVE_TYPES_ARRAY (za_d_integer);
+DEF_SVE_TYPES_ARRAY (mop_base);
+DEF_SVE_TYPES_ARRAY (mop_base_signed);
+DEF_SVE_TYPES_ARRAY (mop_base_unsigned);
+DEF_SVE_TYPES_ARRAY (mop_i16i64);
+DEF_SVE_TYPES_ARRAY (mop_i16i64_signed);
+DEF_SVE_TYPES_ARRAY (mop_i16i64_unsigned);
+DEF_SVE_TYPES_ARRAY (za);
+
+DEF_SVE_TYPES_ARRAY (bhd_poly);
+DEF_SVE_TYPES_ARRAY (bhdq_poly);
+
+static const group_suffix_index groups_none[] = {
+  GROUP_none, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_x2[] = { GROUP_x2, NUM_GROUP_SUFFIXES };
+
+static const group_suffix_index groups_x12[] = {
+  GROUP_none, GROUP_x2, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_x4[] = { GROUP_x4, NUM_GROUP_SUFFIXES };
+
+static const group_suffix_index groups_x24[] = {
+  GROUP_x2, GROUP_x4, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_x124[] = {
+  GROUP_none, GROUP_x2, GROUP_x4, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_x1234[] = {
+  GROUP_none, GROUP_x2, GROUP_x3, GROUP_x4, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_vg1x2[] = {
+  GROUP_vg1x2, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_vg1x4[] = {
+  GROUP_vg1x4, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_vg1x24[] = {
+  GROUP_vg1x2, GROUP_vg1x4, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_vg2[] = {
+  GROUP_vg2x1, GROUP_vg2x2, GROUP_vg2x4, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_vg4[] = {
+  GROUP_vg4x1, GROUP_vg4x2, GROUP_vg4x4, NUM_GROUP_SUFFIXES
+};
+
+static const group_suffix_index groups_vg24[] = {
+  GROUP_vg2, GROUP_vg4, NUM_GROUP_SUFFIXES
+};
+
+/* Used by functions that have no governing predicate.  */
+static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
+
+/* Used by functions that have a governing predicate but do not have an
+   explicit suffix.  */
+static const predication_index preds_implicit[] = { PRED_implicit, NUM_PREDS };
+
+/* Used by functions that only support "_m" predication.  */
+static const predication_index preds_m[] = { PRED_m, NUM_PREDS };
+
+/* Used by functions that allow merging and "don't care" predication,
+   but are not suitable for predicated MOVPRFX.  */
+static const predication_index preds_mx[] = {
+  PRED_m, PRED_x, NUM_PREDS
+};
+
+/* Used by functions that allow merging, zeroing and "don't care"
+   predication.  */
+static const predication_index preds_mxz[] = {
+  PRED_m, PRED_x, PRED_z, NUM_PREDS
+};
+
+/* Used by functions that have the mxz predicated forms above, and in addition
+   have an unpredicated form.  */
+static const predication_index preds_mxz_or_none[] = {
+  PRED_m, PRED_x, PRED_z, PRED_none, NUM_PREDS
+};
+
+/* Used by functions that allow merging and zeroing predication but have
+   no "_x" form.  */
+static const predication_index preds_mz[] = { PRED_m, PRED_z, NUM_PREDS };
+
+/* Used by functions that have an unpredicated form and a _z predicated
+   form.  */
+static const predication_index preds_z_or_none[] = {
+  PRED_z, PRED_none, NUM_PREDS
+};
+
+/* Used by (mostly predicate) functions that only support "_z" predication.  */
+static const predication_index preds_z[] = { PRED_z, NUM_PREDS };
+
+/* Used by SME instructions that always merge into ZA.  */
+static const predication_index preds_za_m[] = { PRED_za_m, NUM_PREDS };
 }
 
 #endif
diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 611f6dc45e0a..e9e237f65aae 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -52,6 +52,7 @@
 #include "tree-pass.h"
 #include "tree-vector-builder.h"
 #include "aarch64-builtins.h"
+#include "aarch64-neon-builtins.h"
 
 using namespace aarch64;
 
@@ -1938,12 +1939,11 @@ aarch64_target_switcher::~aarch64_target_switcher ()
          sizeof (have_regs_of_mode));
 }
 
-/* Implement #pragma GCC aarch64 "arm_neon.h".
-
-   The types and functions defined here need to be available internally
-   during LTO as well.  */
+/* Initialize NEON builtins using the old framework.
+   Delete once NEON all intrinsics have been ported to the pragma-based
+   framework.  */
 void
-handle_arm_neon_h (void)
+init_arm_neon_builtins (void)
 {
   aarch64_target_switcher switcher (AARCH64_FL_SIMD);
 
@@ -1971,7 +1971,7 @@ aarch64_init_simd_builtins (void)
 
   aarch64_init_simd_builtin_functions (false);
   if (in_lto_p)
-    handle_arm_neon_h ();
+    init_arm_neon_builtins ();
 
   /* Initialize the remaining fcmla_laneq intrinsics.  */
   aarch64_init_fcmla_laneq_builtins ();
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index ef2475154e85..85842152862b 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -32,6 +32,7 @@
 #include "c-family/c-pragma.h"
 #include "langhooks.h"
 #include "target.h"
+#include "aarch64-neon-builtins.h"
 
 
 #define builtin_define(TXT) cpp_define (pfile, TXT)
@@ -409,7 +410,7 @@ aarch64_pragma_aarch64 (cpp_reader *)
   else if (strcmp (name, "arm_sme.h") == 0)
     aarch64_acle::handle_arm_sme_h (false);
   else if (strcmp (name, "arm_neon.h") == 0)
-    handle_arm_neon_h ();
+    aarch64_acle::handle_arm_neon_h (false);
   else if (strcmp (name, "arm_acle.h") == 0)
     handle_arm_acle_h ();
   else if (strcmp (name, "arm_neon_sve_bridge.h") == 0)
diff --git a/gcc/config/aarch64/aarch64-neon-builtins-base.cc 
b/gcc/config/aarch64/aarch64-neon-builtins-base.cc
new file mode 100644
index 000000000000..4c3c33c56629
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-neon-builtins-base.cc
@@ -0,0 +1,113 @@
+/* ACLE support for AArch64 NEON (__ARM_FEATURE_SIMD intrinsics)
+   Copyright (C) 2026-2026 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "memmodel.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "recog.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "function.h"
+#include "fold-const.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "explow.h"
+#include "tree-vector-builder.h"
+#include "rtx-vector-builder.h"
+#include "vec-perm-indices.h"
+#include "aarch64-acle-builtins.h"
+#include "aarch64-neon-builtins-base.h"
+#include "aarch64-neon-builtins-functions.h"
+#include "aarch64-neon-builtins.h"
+#include "gimple-fold.h"
+
+using namespace aarch64_acle;
+
+/* Base class for all function expanders.
+   At least one of `expand` or `fold` must be overriden by derived classes.  */
+class gimple_function_base : public function_base
+{
+  rtx expand (function_expander &) const override { gcc_unreachable (); }
+  gimple *fold (gimple_folder &) const override { gcc_unreachable (); }
+};
+
+/* For intrinsics that map to a single GIMPLE expression with no argument
+   preparation necessary.  */
+class gimple_expr : public gimple_function_base
+{
+  tree_code m_int_code;
+  tree_code m_float_code;
+  tree_code m_poly_code;
+
+public:
+  constexpr gimple_expr (tree_code code)
+    : m_int_code (code),
+      m_float_code (code),
+      m_poly_code (code)
+    {}
+
+  constexpr gimple_expr (tree_code int_code,
+                        tree_code float_code,
+                        tree_code poly_code)
+    : m_int_code (int_code),
+      m_float_code (float_code),
+      m_poly_code (poly_code)
+    {}
+
+  gimple *fold (gimple_folder &f) const override
+  {
+    auto nargs = gimple_call_num_args (f.call);
+    auto arg0 = nargs >= 1 ? gimple_call_arg (f.call, 0) : nullptr;
+    auto arg1 = nargs >= 2 ? gimple_call_arg (f.call, 1) : nullptr;
+    auto arg2 = nargs >= 3 ? gimple_call_arg (f.call, 2) : nullptr;
+
+    tree_code code;
+    auto type_class = f.type_suffix (0).tclass;
+    switch (type_class)
+      {
+      case TYPE_signed:
+      case TYPE_unsigned:
+       code = m_int_code;
+       break;
+      case TYPE_float:
+       code = m_float_code;
+       break;
+      case TYPE_poly:
+       code = m_poly_code;
+       break;
+      default:
+       gcc_unreachable ();
+      }
+
+    return gimple_build_assign (f.lhs, code, arg0, arg1, arg2);
+  }
+};
+
+// Lanewise arithmetic
+NEON_FUNCTION (vaddd, gimple_expr, (PLUS_EXPR))
+NEON_FUNCTION (vadd,  gimple_expr, (PLUS_EXPR, PLUS_EXPR, BIT_XOR_EXPR))
+NEON_FUNCTION (vaddq, gimple_expr, (PLUS_EXPR, PLUS_EXPR, BIT_XOR_EXPR))
diff --git a/gcc/config/aarch64/aarch64-neon-builtins-base.def 
b/gcc/config/aarch64/aarch64-neon-builtins-base.def
new file mode 100644
index 000000000000..c8077d96a7dd
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-neon-builtins-base.def
@@ -0,0 +1,33 @@
+/* ACLE support for AArch64 NEON (__ARM_FEATURE_SIMD intrinsics)
+   Copyright (C) 2026-2026 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+// Lanewise arithmetic
+#define REQUIRED_EXTENSIONS nonstreaming_only (AARCH64_FL_SIMD)
+DEF_NEON_FUNCTION (vaddd, d_integer,        ("s0,s0,s0"))
+DEF_NEON_FUNCTION (vadd,  all_arith_no_fp16, ("D0,D0,D0"))
+DEF_NEON_FUNCTION (vadd,  bhd_poly,         ("D0,D0,D0"))
+DEF_NEON_FUNCTION (vaddq, all_arith_no_fp16, ("Q0,Q0,Q0"))
+DEF_NEON_FUNCTION (vaddq, bhdq_poly,        ("Q0,Q0,Q0"))
+#undef REQUIRED_EXTENSIONS
+
+// Lanewise arithmetic (FP16)
+#define REQUIRED_EXTENSIONS nonstreaming_only (AARCH64_FL_F16)
+DEF_NEON_FUNCTION (vadd,  h_float, ("D0,D0,D0"))
+DEF_NEON_FUNCTION (vaddq, h_float, ("Q0,Q0,Q0"))
+#undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-neon-builtins-base.h 
b/gcc/config/aarch64/aarch64-neon-builtins-base.h
new file mode 100644
index 000000000000..9612bef42f26
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-neon-builtins-base.h
@@ -0,0 +1,29 @@
+/* ACLE support for AArch64 NEON (__ARM_FEATURE_SIMD intrinsics)
+   Copyright (C) 2026-2026 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_NEON_BUILTINS_BASE_H
+#define GCC_AARCH64_NEON_BUILTINS_BASE_H
+
+namespace aarch64_acle::functions {
+#define DEF_NEON_FUNCTION(NAME, ...) \
+  extern const aarch64_acle::function_base *const NAME;
+#include "aarch64-neon-builtins.def"
+}
+
+#endif
diff --git a/gcc/config/aarch64/aarch64-neon-builtins-functions.h 
b/gcc/config/aarch64/aarch64-neon-builtins-functions.h
new file mode 100644
index 000000000000..58a631ac54e0
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-neon-builtins-functions.h
@@ -0,0 +1,29 @@
+/* ACLE support for AArch64 NEON (function_base classes)
+   Copyright (C) 2026-2026 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_NEON_BUILTINS_FUNCTIONS_H
+#define GCC_AARCH64_NEON_BUILTINS_FUNCTIONS_H
+
+/* Declare the global function base NAME, creating it from an instance
+   of class CLASS with constructor arguments ARGS.  */
+#define NEON_FUNCTION(NAME, CLASS, ARGS)                               \
+  namespace { static constexpr const CLASS NAME##_obj ARGS; }          \
+  const function_base *const aarch64_acle::functions::NAME = &NAME##_obj;
+
+#endif
diff --git a/gcc/config/aarch64/aarch64-neon-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-neon-builtins-shapes.cc
new file mode 100644
index 000000000000..7946a7675eb5
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-neon-builtins-shapes.cc
@@ -0,0 +1,69 @@
+/* ACLE support for AArch64 NEON (function shapes)
+   Copyright (C) 2026-2026 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define INCLUDE_ALGORITHM
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "basic-block.h"
+#include "tree.h"
+#include "function.h"
+#include "gimple.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "memmodel.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "aarch64-acle-builtins.h"
+#include "aarch64-sve-builtins-shapes.h"
+#include "aarch64-builtins.h"
+
+using namespace aarch64_acle;
+
+/* All NEON functions are non-overloaded, so we don't need bespoke
+  function shapes.  Instead, we can just use a single shape for all NEON
+  functions, parameterised by a signature.  */
+struct neon_shape : public function_shape
+{
+  constexpr neon_shape (const char *signature)
+    : m_signature (signature)
+  {}
+
+  const char *m_signature;
+
+  void build (function_builder &b,
+             const function_group_info &group) const override
+  {
+    aarch64_acle::build_all (b, this->m_signature, group, MODE_none);
+  }
+
+  bool check (function_checker &) const override { return true; }
+
+  bool explicit_type_suffix_p (unsigned int) const override { return true; }
+  tree resolve (function_resolver &) const override { gcc_unreachable (); }
+};
+
+namespace aarch64_acle::shapes {
+#define DEF_NEON_FUNCTION(NAME, TYPES, SHAPE_ARGS)                     \
+  static constexpr const neon_shape OBJ_NAME (NAME, TYPES) SHAPE_ARGS; \
+  const aarch64_acle::function_shape *SHAPE_NAME (NAME, TYPES)         \
+    = &OBJ_NAME (NAME, TYPES);
+#include "aarch64-neon-builtins.def"
+}
diff --git a/gcc/config/aarch64/aarch64-neon-builtins-shapes.h 
b/gcc/config/aarch64/aarch64-neon-builtins-shapes.h
new file mode 100644
index 000000000000..c94f4c994643
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-neon-builtins-shapes.h
@@ -0,0 +1,29 @@
+/* ACLE support for AArch64 NEON (function shapes)
+   Copyright (C) 2026-2026 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_NEON_BUILTINS_SHAPES_H
+#define GCC_AARCH64_NEON_BUILTINS_SHAPES_H
+
+namespace aarch64_acle::shapes {
+#define DEF_NEON_FUNCTION(NAME, TYPES, SHAPE_ARGS) \
+  extern const aarch64_acle::function_shape *const SHAPE_NAME (NAME, TYPES);
+#include "aarch64-neon-builtins.def"
+}
+
+#endif
diff --git a/gcc/config/aarch64/aarch64-neon-builtins.cc 
b/gcc/config/aarch64/aarch64-neon-builtins.cc
new file mode 100644
index 000000000000..7159b265ec9c
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-neon-builtins.cc
@@ -0,0 +1,86 @@
+/* ACLE support for AArch64 NEON
+   Copyright (C) 2026-2026 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "memmodel.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "diagnostic.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "function.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "explow.h"
+#include "aarch64-acle-builtins.h"
+#include "aarch64-sve-builtins-base.h"
+#include "aarch64-sve-builtins-shapes.h"
+#include "aarch64-neon-builtins-shapes.h"
+#include "aarch64-neon-builtins-functions.h"
+#include "aarch64-neon-builtins-base.h"
+#include "aarch64-builtins.h"
+
+/* Implement `#pragma GCC aarch64 "arm_neon"`.  */
+namespace aarch64_acle {
+constexpr const aarch64_acle::function_group_info neon_function_groups[] = {
+#define DEF_NEON_FUNCTION(NAME, TYPES, SHAPE_ARGS)                     \
+  {                                                                    \
+    /* .base_name  = */ #NAME,                                         \
+    /* .base       = */ &aarch64_acle::functions::NAME,                        
\
+    /* .shape      = */ &aarch64_acle::shapes::SHAPE_NAME (NAME, TYPES),\
+    /* .types      = */ aarch64_acle::types_##TYPES,                   \
+    /* .groups     = */ aarch64_acle::groups_none,                     \
+    /* .preds      = */ aarch64_acle::preds_none,                      \
+    /* .extensions = */ aarch64_required_extensions::REQUIRED_EXTENSIONS,\
+    /* .fpm_mode   = */ aarch64_acle::FPM_unused,                      \
+  },
+#include "aarch64-neon-builtins.def"
+};
+
+bool arm_neon_h_handled = false;
+
+void
+handle_arm_neon_h (bool function_nulls_p)
+{
+  if (arm_neon_h_handled)
+    return;
+
+  /* FIXME: Remove once all NEON intrinsics have been ported to the 
pragma-based
+    framework.  */
+  init_arm_neon_builtins ();
+
+  aarch64_target_switcher switcher;
+  aarch64_acle::function_builder builder (aarch64_acle::arm_neon_handle,
+                                         function_nulls_p);
+
+  for (auto &group : neon_function_groups)
+    builder.register_function_group (group);
+
+  arm_neon_h_handled = true;
+}
+};
diff --git a/gcc/config/aarch64/aarch64-neon-builtins.def 
b/gcc/config/aarch64/aarch64-neon-builtins.def
new file mode 100644
index 000000000000..58630eebe489
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-neon-builtins.def
@@ -0,0 +1,40 @@
+/* ACLE support for AArch64 NEON (__ARM_FEATURE_SIMD intrinsics)
+   Copyright (C) 2026-2026 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Code organization: See block comment at the top of
+   aarch64-sve-builtins.def.  */
+
+/* Define a new function group.  */
+#ifndef DEF_NEON_FUNCTION
+#define DEF_NEON_FUNCTION(NAME, TYPES, SHAPE_ARGS)
+#endif
+
+/* Helper for generating the name of the function_group's corresponding
+   neon_shape instance.  */
+#define OBJ_NAME(NAME, TYPES) NAME ## _ ## TYPES ## _obj
+
+/* Helper for generating the name of the function_group's corresponding
+   function_shape.  */
+#define SHAPE_NAME(NAME, TYPES) NAME ## _ ## TYPES ## _shape
+
+#include "aarch64-neon-builtins-base.def"
+
+#undef DEF_NEON_FUNCTION
+#undef OBJ_NAME
+#undef SHAPE_NAME
diff --git a/gcc/config/aarch64/aarch64-neon-builtins.h 
b/gcc/config/aarch64/aarch64-neon-builtins.h
new file mode 100644
index 000000000000..c3bf53674cbc
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-neon-builtins.h
@@ -0,0 +1,28 @@
+/* ACLE support for AArch64 NEON
+   Copyright (C) 2026-2026 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_NEON_BUILTINS_H
+#define GCC_AARCH64_NEON_BUILTINS_H
+
+namespace aarch64_acle {
+extern bool arm_neon_h_handled;
+void handle_arm_neon_h (bool);
+};
+
+#endif
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index b794cd7de664..1610f54986e6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1155,7 +1155,7 @@ tree aarch64_general_builtin_decl (unsigned, bool);
 tree aarch64_general_builtin_rsqrt (unsigned int);
 void aarch64_ms_variadic_abi_init_builtins (void);
 void handle_arm_acle_h (void);
-void handle_arm_neon_h (void);
+void init_arm_neon_builtins (void);
 
 bool aarch64_check_required_extensions (location_t, tree,
                                        aarch64_required_extensions);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index 5fb65dd8a319..b51115359eaf 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -277,6 +277,18 @@ parse_type (const function_instance &instance, const char 
*&format)
   if (ch == 's')
     {
       type_suffix_index suffix = parse_element_type (instance, format);
+
+      // HACK: remove once all NEON intrinsics have been ported to the
+      // pragma-based framework.
+      if (suffix == TYPE_SUFFIX_p8)
+       return aarch64_simd_types_trees[Poly8_t].eltype;
+      if (suffix == TYPE_SUFFIX_p16)
+       return aarch64_simd_types_trees[Poly16_t].eltype;
+      if (suffix == TYPE_SUFFIX_p64)
+       return aarch64_simd_types_trees[Poly64_t].eltype;
+      if (suffix == TYPE_SUFFIX_p128)
+       return aarch64_simd_types_trees[Poly128_t].eltype;
+
       return scalar_types[type_suffixes[suffix].vector_type];
     }
 
@@ -530,10 +542,10 @@ build_vs_offset (function_builder &b, const char 
*signature,
    predicate.  FORCE_DIRECT_OVERLOADS is true if there is a one-to-one
    mapping between "short" and "full" names, and if standard overload
    resolution therefore isn't necessary.  */
-static void
+void
 build_all (function_builder &b, const char *signature,
           const function_group_info &group, mode_suffix_index mode_suffix_id,
-          bool force_direct_overloads = false)
+          bool force_direct_overloads)
 {
   for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
     for (unsigned int gi = 0; group.groups[gi] != NUM_GROUP_SUFFIXES; ++gi)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index da96da69f273..6f5244ae81d2 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -55,6 +55,7 @@
 #include "aarch64-sve-builtins-sme.h"
 #include "aarch64-sve-builtins-shapes.h"
 #include "aarch64-builtins.h"
+#include "aarch64-neon-builtins.h"
 
 using namespace aarch64;
 
@@ -180,814 +181,6 @@ CONSTEXPR const group_suffix_info group_suffixes[] = {
   { "", 0, 1 }
 };
 
-/* Define a TYPES_<combination> macro for each combination of type
-   suffixes that an ACLE function can have, where <combination> is the
-   name used in DEF_SVE_FUNCTION entries.
-
-   Use S (T) for single type suffix T and D (T1, T2) for a pair of type
-   suffixes T1 and T2.  Use commas to separate the suffixes.
-
-   Although the order shouldn't matter, the convention is to sort the
-   suffixes lexicographically after dividing suffixes into a type
-   class ("b", "f", etc.) and a numerical bit count.  */
-
-/* _b8 _b16 _b32 _b64.  */
-#define TYPES_all_pred(S, D, T) \
-  S (b8), S (b16), S (b32), S (b64)
-
-/* _c8 _c16 _c32 _c64.  */
-#define TYPES_all_count(S, D, T) \
-  S (c8), S (c16), S (c32), S (c64)
-
-/* _b8 _b16 _b32 _b64
-   _c8 _c16 _c32 _c64.  */
-#define TYPES_all_pred_count(S, D, T) \
-  TYPES_all_pred (S, D, T), \
-  TYPES_all_count (S, D, T)
-
-/* _f16 _f32 _f64.  */
-#define TYPES_all_float(S, D, T) \
-  S (f16), S (f32), S (f64)
-
-/* _s8 _s16 _s32 _s64.  */
-#define TYPES_all_signed(S, D, T) \
-  S (s8), S (s16), S (s32), S (s64)
-
-/*     _f16 _f32 _f64
-   _s8 _s16 _s32 _s64.  */
-#define TYPES_all_float_and_signed(S, D, T) \
-  TYPES_all_float (S, D, T), TYPES_all_signed (S, D, T)
-
-/* _u8 _u16 _u32 _u64.  */
-#define TYPES_all_unsigned(S, D, T) \
-  S (u8), S (u16), S (u32), S (u64)
-
-/* _s8 _s16 _s32 _s64
-   _u8 _u16 _u32 _u64.  */
-#define TYPES_all_integer(S, D, T) \
-  TYPES_all_signed (S, D, T), TYPES_all_unsigned (S, D, T)
-
-/*     _f16 _f32 _f64
-   _s8 _s16 _s32 _s64
-   _u8 _u16 _u32 _u64.  */
-#define TYPES_all_arith(S, D, T) \
-  TYPES_all_float (S, D, T), TYPES_all_integer (S, D, T)
-
-#define TYPES_all_data(S, D, T) \
-  TYPES_b_data (S, D, T), \
-  TYPES_h_data (S, D, T), \
-  TYPES_s_data (S, D, T), \
-  TYPES_d_data (S, D, T)
-
-/* _b only.  */
-#define TYPES_b(S, D, T) \
-  S (b)
-
-/* _c only.  */
-#define TYPES_c(S, D, T) \
-  S (c)
-
-/* _u8.  */
-#define TYPES_b_unsigned(S, D, T) \
-  S (u8)
-
-/* _s8
-   _u8.  */
-#define TYPES_b_integer(S, D, T) \
-  S (s8), TYPES_b_unsigned (S, D, T)
-
-/* _mf8
-   _s8
-   _u8.  */
-#define TYPES_b_data(S, D, T) \
-  S (mf8), TYPES_b_integer (S, D, T)
-
-/* _s8 _s16
-   _u8 _u16.  */
-#define TYPES_bh_integer(S, D, T) \
-  S (s8), S (s16), S (u8), S (u16)
-
-/* _u8 _u32.  */
-#define TYPES_bs_unsigned(S, D, T) \
-  S (u8), S (u32)
-
-/* _s8 _s16 _s32.  */
-#define TYPES_bhs_signed(S, D, T) \
-  S (s8), S (s16), S (s32)
-
-/* _u8 _u16 _u32.  */
-#define TYPES_bhs_unsigned(S, D, T) \
-  S (u8), S (u16), S (u32)
-
-/* _s8 _s16 _s32
-   _u8 _u16 _u32.  */
-#define TYPES_bhs_integer(S, D, T) \
-  TYPES_bhs_signed (S, D, T), TYPES_bhs_unsigned (S, D, T)
-
-#define TYPES_bh_data(S, D, T)                 \
-  TYPES_b_data (S, D, T), \
-  TYPES_h_data (S, D, T)
-
-#define TYPES_bhs_data(S, D, T)                        \
-  TYPES_b_data (S, D, T), \
-  TYPES_h_data (S, D, T), \
-  TYPES_s_data (S, D, T)
-
-/* _s16_s8  _s32_s16  _s64_s32
-   _u16_u8  _u32_u16  _u64_u32.  */
-#define TYPES_bhs_widen(S, D, T) \
-  D (s16, s8), D (s32, s16), D (s64, s32), \
-  D (u16, u8), D (u32, u16), D (u64, u32)
-
-/* _bf16.  */
-#define TYPES_h_bfloat(S, D, T) \
-  S (bf16)
-
-/* _f16.  */
-#define TYPES_h_float(S, D, T) \
-  S (f16)
-
-/* _s16
-   _u16.  */
-#define TYPES_h_integer(S, D, T) \
-  S (s16), S (u16)
-
-/* _bf16
-   _f16
-   _s16
-   _u16.  */
-#define TYPES_h_data(S, D, T) \
-  S (bf16), S (f16), TYPES_h_integer (S, D, T)
-
-/* _s16 _s32.  */
-#define TYPES_hs_signed(S, D, T) \
-  S (s16), S (s32)
-
-/* _s16 _s32
-   _u16 _u32.  */
-#define TYPES_hs_integer(S, D, T) \
-  TYPES_hs_signed (S, D, T), S (u16), S (u32)
-
-/* _f16 _f32.  */
-#define TYPES_hs_float(S, D, T) \
-  S (f16), S (f32)
-
-#define TYPES_hs_data(S, D, T) \
-  TYPES_h_data (S, D, T), \
-  TYPES_s_data (S, D, T)
-
-/* _u16 _u64.  */
-#define TYPES_hd_unsigned(S, D, T) \
-  S (u16), S (u64)
-
-/* _s16 _s32 _s64.  */
-#define TYPES_hsd_signed(S, D, T) \
-  S (s16), S (s32), S (s64)
-
-/* _s16 _s32 _s64
-   _u16 _u32 _u64.  */
-#define TYPES_hsd_integer(S, D, T) \
-  TYPES_hsd_signed (S, D, T), S (u16), S (u32), S (u64)
-
-#define TYPES_hsd_data(S, D, T) \
-  TYPES_h_data (S, D, T), \
-  TYPES_s_data (S, D, T), \
-  TYPES_d_data (S, D, T)
-
-/* _f16_mf8.  */
-#define TYPES_h_float_mf8(S, D, T) \
-  D (f16, mf8)
-
-/* _f32.  */
-#define TYPES_s_float(S, D, T) \
-  S (f32)
-
-/* _f32_mf8.  */
-#define TYPES_s_float_mf8(S, D, T) \
-  D (f32, mf8)
-
-/*      _f32
-   _s16 _s32 _s64
-   _u16 _u32 _u64.  */
-#define TYPES_s_float_hsd_integer(S, D, T) \
-  TYPES_s_float (S, D, T), TYPES_hsd_integer (S, D, T)
-
-/* _f32
-   _s32 _s64
-   _u32 _u64.  */
-#define TYPES_s_float_sd_integer(S, D, T) \
-  TYPES_s_float (S, D, T), TYPES_sd_integer (S, D, T)
-
-/* _s32.  */
-#define TYPES_s_signed(S, D, T) \
-  S (s32)
-
-/* _u32.  */
-#define TYPES_s_unsigned(S, D, T) \
-  S (u32)
-
-/* _s32
-   _u32.  */
-#define TYPES_s_integer(S, D, T) \
-  TYPES_s_signed (S, D, T), TYPES_s_unsigned (S, D, T)
-
-/* _f32
-   _s32
-   _u32.  */
-#define TYPES_s_data(S, D, T) \
-  TYPES_s_float (S, D, T), TYPES_s_integer (S, D, T)
-
-/* _s32 _s64.  */
-#define TYPES_sd_signed(S, D, T) \
-  S (s32), S (s64)
-
-/* _u32 _u64.  */
-#define TYPES_sd_unsigned(S, D, T) \
-  S (u32), S (u64)
-
-/* _s32 _s64
-   _u32 _u64.  */
-#define TYPES_sd_integer(S, D, T) \
-  TYPES_sd_signed (S, D, T), TYPES_sd_unsigned (S, D, T)
-
-#define TYPES_sd_data(S, D, T) \
-  TYPES_s_data (S, D, T), \
-  TYPES_d_data (S, D, T)
-
-/* _f16 _f32 _f64
-       _s32 _s64
-       _u32 _u64.  */
-#define TYPES_all_float_and_sd_integer(S, D, T) \
-  TYPES_all_float (S, D, T), TYPES_sd_integer (S, D, T)
-
-/* _f64.  */
-#define TYPES_d_float(S, D, T) \
-  S (f64)
-
-/* _u64.  */
-#define TYPES_d_unsigned(S, D, T) \
-  S (u64)
-
-/* _s64
-   _u64.  */
-#define TYPES_d_integer(S, D, T) \
-  S (s64), TYPES_d_unsigned (S, D, T)
-
-/* _f64
-   _s64
-   _u64.  */
-#define TYPES_d_data(S, D, T) \
-  TYPES_d_float (S, D, T), TYPES_d_integer (S, D, T)
-
-/* All the type combinations allowed by svcvt.  */
-#define TYPES_cvt(S, D, T) \
-  D (f16, f32), D (f16, f64), \
-  D (f16, s16), D (f16, s32), D (f16, s64), \
-  D (f16, u16), D (f16, u32), D (f16, u64), \
-  \
-  D (f32, f16), D (f32, f64), \
-  D (f32, s32), D (f32, s64), \
-  D (f32, u32), D (f32, u64), \
-  \
-  D (f64, f16), D (f64, f32), \
-  D (f64, s32), D (f64, s64), \
-  D (f64, u32), D (f64, u64), \
-  \
-  D (s16, f16), \
-  D (s32, f16), D (s32, f32), D (s32, f64), \
-  D (s64, f16), D (s64, f32), D (s64, f64), \
-  \
-  D (u16, f16), \
-  D (u32, f16), D (u32, f32), D (u32, f64), \
-  D (u64, f16), D (u64, f32), D (u64, f64)
-
-/* _bf16_f32.  */
-#define TYPES_cvt_bfloat(S, D, T) \
-  D (bf16, f32)
-
-/* { _bf16 _f16 } x _f32.  */
-#define TYPES_cvt_h_s_float(S, D, T) \
-  D (bf16, f32), D (f16, f32)
-
-/* _f32_f16.  */
-#define TYPES_cvt_f32_f16(S, D, T) \
-  D (f32, f16)
-
-/* _f32_f16
-   _f64_f32.  */
-#define TYPES_cvt_long(S, D, T) \
-  D (f32, f16), D (f64, f32)
-
-/* _f32_f64.  */
-#define TYPES_cvt_narrow_s(S, D, T) \
-  D (f32, f64)
-
-/* _f16_f32
-   _f32_f64.  */
-#define TYPES_cvt_narrow(S, D, T) \
-  D (f16, f32), TYPES_cvt_narrow_s (S, D, T)
-
-/* { _s32 _u32 } x _f32
-
-   _f32 x { _s32 _u32 }.  */
-#define TYPES_cvt_s_s(S, D, T) \
-  D (s32, f32), \
-  D (u32, f32), \
-  D (f32, s32), \
-  D (f32, u32)
-
-/* _f16_mf8
-   _bf16_mf8.  */
-#define TYPES_cvt_mf8(S, D, T) \
-  D (f16, mf8), D (bf16, mf8)
-
-/* _mf8_f16
-   _mf8_bf16.  */
-#define TYPES_cvtn_mf8(S, D, T) \
-  D (mf8, f16), D (mf8, bf16)
-
-/* _mf8_f32.  */
-#define TYPES_cvtnx_mf8(S, D, T) \
-  D (mf8, f32)
-
-/* { _s32 _s64 } x { _b8 _b16 _b32 _b64 }
-   { _u32 _u64 }.  */
-#define TYPES_inc_dec_n1(D, A) \
-  D (A, b8), D (A, b16), D (A, b32), D (A, b64)
-#define TYPES_inc_dec_n(S, D, T) \
-  TYPES_inc_dec_n1 (D, s32), \
-  TYPES_inc_dec_n1 (D, s64), \
-  TYPES_inc_dec_n1 (D, u32), \
-  TYPES_inc_dec_n1 (D, u64)
-
-/* { _s16 _u16 } x _s32
-
-   {      _u16 } x _u32.  */
-#define TYPES_qcvt_x2(S, D, T) \
-  D (s16, s32), \
-  D (u16, u32), \
-  D (u16, s32)
-
-/* { _s8  _u8  } x _s32
-
-   {      _u8  } x _u32
-
-   { _s16 _u16 } x _s64
-
-   {      _u16 } x _u64.  */
-#define TYPES_qcvt_x4(S, D, T) \
-  D (s8, s32), \
-  D (u8, u32), \
-  D (u8, s32), \
-  D (s16, s64), \
-  D (u16, u64), \
-  D (u16, s64)
-
-/* _s16_s32
-   _u16_u32.  */
-#define TYPES_qrshr_x2(S, D, T) \
-  D (s16, s32), \
-  D (u16, u32)
-
-/* _u16_s32.  */
-#define TYPES_qrshru_x2(S, D, T) \
-  D (u16, s32)
-
-/* _s8_s32
-   _s16_s64
-   _u8_u32
-   _u16_u64.  */
-#define TYPES_qrshr_x4(S, D, T) \
-  D (s8, s32), \
-  D (s16, s64), \
-  D (u8, u32), \
-  D (u16, u64)
-
-/* _u8_s32
-   _u16_s64.  */
-#define TYPES_qrshru_x4(S, D, T) \
-  D (u8, s32), \
-  D (u16, s64)
-
-/* { _mf8 _bf16          }   { _mf8 _bf16          }
-   {      _f16 _f32 _f64 }   {      _f16 _f32 _f64 }
-   { _s8  _s16 _s32 _s64 } x { _s8  _s16 _s32 _s64 }
-   { _u8  _u16 _u32 _u64 }   { _u8  _u16 _u32 _u64 }.  */
-#define TYPES_reinterpret1(D, A) \
-  D (A, mf8), \
-  D (A, bf16), \
-  D (A, f16), D (A, f32), D (A, f64), \
-  D (A, s8), D (A, s16), D (A, s32), D (A, s64), \
-  D (A, u8), D (A, u16), D (A, u32), D (A, u64)
-#define TYPES_reinterpret(S, D, T) \
-  TYPES_reinterpret1 (D, mf8), \
-  TYPES_reinterpret1 (D, bf16), \
-  TYPES_reinterpret1 (D, f16), \
-  TYPES_reinterpret1 (D, f32), \
-  TYPES_reinterpret1 (D, f64), \
-  TYPES_reinterpret1 (D, s8), \
-  TYPES_reinterpret1 (D, s16), \
-  TYPES_reinterpret1 (D, s32), \
-  TYPES_reinterpret1 (D, s64), \
-  TYPES_reinterpret1 (D, u8), \
-  TYPES_reinterpret1 (D, u16), \
-  TYPES_reinterpret1 (D, u32), \
-  TYPES_reinterpret1 (D, u64)
-
-/* _b_c
-   _c_b.  */
-#define TYPES_reinterpret_b(S, D, T) \
-  D (b, c), \
-  D (c, b)
-
-/* { _b8 _b16 _b32 _b64 } x { _s32 _s64 }
-                           { _u32 _u64 } */
-#define TYPES_while1(D, bn) \
-  D (bn, s32), D (bn, s64), D (bn, u32), D (bn, u64)
-#define TYPES_while(S, D, T) \
-  TYPES_while1 (D, b8), \
-  TYPES_while1 (D, b16), \
-  TYPES_while1 (D, b32), \
-  TYPES_while1 (D, b64)
-
-/* { _b8 _b16 _b32 _b64 } x { _s64 }
-                           { _u64 } */
-#define TYPES_while_x(S, D, T) \
-  D (b8, s64), D (b8, u64), \
-  D (b16, s64), D (b16, u64), \
-  D (b32, s64), D (b32, u64), \
-  D (b64, s64), D (b64, u64)
-
-/* { _c8 _c16 _c32 _c64 } x { _s64 }
-                           { _u64 } */
-#define TYPES_while_x_c(S, D, T) \
-  D (c8, s64), D (c8, u64), \
-  D (c16, s64), D (c16, u64), \
-  D (c32, s64), D (c32, u64), \
-  D (c64, s64), D (c64, u64)
-
-/* _f32_f16
-   _s32_s16
-   _u32_u16.  */
-#define TYPES_s_narrow_fsu(S, D, T) \
-  D (f32, f16), D (s32, s16), D (u32, u16)
-
-/* _za8 _za16 _za32 _za64 _za128.  */
-#define TYPES_all_za(S, D, T) \
-  S (za8), S (za16), S (za32), S (za64), S (za128)
-
-/* _za64.  */
-#define TYPES_d_za(S, D, T) \
-  S (za64)
-
-/* {   _za8 } x {  _mf8       _s8  _u8 }
-
-   {  _za16 } x { _bf16 _f16 _s16 _u16 }
-
-   {  _za32 } x {       _f32 _s32 _u32 }
-
-   {  _za64 } x {       _f64 _s64 _u64 }.  */
-#define TYPES_za_bhsd_data(S, D, T) \
-  D (za8, mf8), D (za8, s8), D (za8, u8), \
-  D (za16, bf16), D (za16, f16), D (za16, s16), D (za16, u16), \
-  D (za32, f32), D (za32, s32), D (za32, u32), \
-  D (za64, f64), D (za64, s64), D (za64, u64)
-
-/* Likewise, plus:
-
-   { _za128 } x {      _bf16           }
-               {       _f16 _f32 _f64 }
-               { _s8   _s16 _s32 _s64 }
-               { _u8   _u16 _u32 _u64 }.  */
-
-#define TYPES_za_all_data(S, D, T) \
-  TYPES_za_bhsd_data (S, D, T), \
-  TYPES_reinterpret1 (D, za128)
-
-/* _za16_mf8.  */
-#define TYPES_za_h_mf8(S, D, T) \
-  D (za16, mf8)
-
-/* { _za_16 _za_32 } x _mf8.  */
-#define TYPES_za_hs_mf8(S, D, T) \
-  D (za16, mf8), D (za32, mf8)
-
-/* _za16_bf16.  */
-#define TYPES_za_h_bfloat(S, D, T) \
-  D (za16, bf16)
-
-/* _za16_f16.  */
-#define TYPES_za_h_float(S, D, T) \
-  D (za16, f16)
-
-/* _za32_s8.  */
-#define TYPES_za_s_b_signed(S, D, T) \
-   D (za32, s8)
-
-/* _za32_u8.  */
-#define TYPES_za_s_b_unsigned(S, D, T) \
-   D (za32, u8)
-
-/* _za32 x { _s8 _u8 }.  */
-#define TYPES_za_s_b_integer(S, D, T) \
-  D (za32, s8), D (za32, u8)
-
-/* _za32 x { _s16 _u16 }.  */
-#define TYPES_za_s_h_integer(S, D, T) \
-  D (za32, s16), D (za32, u16)
-
-/* _za32 x { _bf16 _f16 _s16 _u16 }.  */
-#define TYPES_za_s_h_data(S, D, T) \
-  D (za32, bf16), D (za32, f16), D (za32, s16), D (za32, u16)
-
-/* _za32_u32.  */
-#define TYPES_za_s_unsigned(S, D, T) \
-  D (za32, u32)
-
-/* _za32 x { _s32 _u32 }.  */
-#define TYPES_za_s_integer(S, D, T) \
-  D (za32, s32), D (za32, u32)
-
-/* _za32_mf8.  */
-#define TYPES_za_s_mf8(S, D, T) \
-  D (za32, mf8)
-
-/* _za32_f32.  */
-#define TYPES_za_s_float(S, D, T) \
-  D (za32, f32)
-
-/* _za32 x { _f32 _s32 _u32 }.  */
-#define TYPES_za_s_data(S, D, T) \
-  D (za32, f32), D (za32, s32), D (za32, u32)
-
-/* _za64 x { _s16 _u16 }.  */
-#define TYPES_za_d_h_integer(S, D, T) \
-  D (za64, s16), D (za64, u16)
-
-/* _za64_f64.  */
-#define TYPES_za_d_float(S, D, T) \
-  D (za64, f64)
-
-/* _za64 x { _s64 _u64 }.  */
-#define TYPES_za_d_integer(S, D, T) \
-  D (za64, s64), D (za64, u64)
-
-/* _za32 x { _s8 _u8 _bf16 _f16 _f32 }.  */
-#define TYPES_mop_base(S, D, T) \
-  D (za32, s8), D (za32, u8), D (za32, bf16), D (za32, f16), D (za32, f32)
-
-/* _za32_s8.  */
-#define TYPES_mop_base_signed(S, D, T) \
-  D (za32, s8)
-
-/* _za32_u8.  */
-#define TYPES_mop_base_unsigned(S, D, T) \
-  D (za32, u8)
-
-/* _za64 x { _s16 _u16 }.  */
-#define TYPES_mop_i16i64(S, D, T) \
-  D (za64, s16), D (za64, u16)
-
-/* _za64_s16.  */
-#define TYPES_mop_i16i64_signed(S, D, T) \
-  D (za64, s16)
-
-/* _za64_u16.  */
-#define TYPES_mop_i16i64_unsigned(S, D, T) \
-  D (za64, u16)
-
-/* _za.  */
-#define TYPES_za(S, D, T) \
-  S (za)
-
-/* Describe a tuple of type suffixes in which only the first is used.  */
-#define DEF_VECTOR_TYPE(X) \
-  { TYPE_SUFFIX_ ## X, NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES }
-
-/* Describe a tuple of type suffixes in which only the first two are used.  */
-#define DEF_DOUBLE_TYPE(X, Y) \
-  { TYPE_SUFFIX_ ## X, TYPE_SUFFIX_ ## Y, NUM_TYPE_SUFFIXES }
-
-/* Describe a tuple of type suffixes in which three elements are used.  */
-#define DEF_TRIPLE_TYPE(X, Y, Z) \
-  { TYPE_SUFFIX_ ## X, TYPE_SUFFIX_ ## Y, TYPE_SUFFIX_ ## Z }
-
-/* Create an array that can be used in aarch64-sve-builtins.def to
-   select the type suffixes in TYPES_<NAME>.  */
-#define DEF_SVE_TYPES_ARRAY(NAME) \
-  static const type_suffix_triple types_##NAME[] = { \
-    TYPES_##NAME (DEF_VECTOR_TYPE, DEF_DOUBLE_TYPE, DEF_TRIPLE_TYPE), \
-    { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES } \
-  }
-
-/* For functions that don't take any type suffixes.  */
-static const type_suffix_triple types_none[] = {
-  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES },
-  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES }
-};
-
-/* Create an array for each TYPES_<combination> macro above.  */
-DEF_SVE_TYPES_ARRAY (all_pred);
-DEF_SVE_TYPES_ARRAY (all_count);
-DEF_SVE_TYPES_ARRAY (all_pred_count);
-DEF_SVE_TYPES_ARRAY (all_float);
-DEF_SVE_TYPES_ARRAY (all_signed);
-DEF_SVE_TYPES_ARRAY (all_float_and_signed);
-DEF_SVE_TYPES_ARRAY (all_unsigned);
-DEF_SVE_TYPES_ARRAY (all_integer);
-DEF_SVE_TYPES_ARRAY (all_arith);
-DEF_SVE_TYPES_ARRAY (all_data);
-DEF_SVE_TYPES_ARRAY (b);
-DEF_SVE_TYPES_ARRAY (b_unsigned);
-DEF_SVE_TYPES_ARRAY (b_integer);
-DEF_SVE_TYPES_ARRAY (bh_integer);
-DEF_SVE_TYPES_ARRAY (bs_unsigned);
-DEF_SVE_TYPES_ARRAY (bhs_signed);
-DEF_SVE_TYPES_ARRAY (bhs_unsigned);
-DEF_SVE_TYPES_ARRAY (bhs_integer);
-DEF_SVE_TYPES_ARRAY (bh_data);
-DEF_SVE_TYPES_ARRAY (bhs_data);
-DEF_SVE_TYPES_ARRAY (bhs_widen);
-DEF_SVE_TYPES_ARRAY (c);
-DEF_SVE_TYPES_ARRAY (h_bfloat);
-DEF_SVE_TYPES_ARRAY (h_float);
-DEF_SVE_TYPES_ARRAY (h_float_mf8);
-DEF_SVE_TYPES_ARRAY (h_integer);
-DEF_SVE_TYPES_ARRAY (h_data);
-DEF_SVE_TYPES_ARRAY (hs_signed);
-DEF_SVE_TYPES_ARRAY (hs_integer);
-DEF_SVE_TYPES_ARRAY (hs_float);
-DEF_SVE_TYPES_ARRAY (hs_data);
-DEF_SVE_TYPES_ARRAY (hd_unsigned);
-DEF_SVE_TYPES_ARRAY (hsd_signed);
-DEF_SVE_TYPES_ARRAY (hsd_integer);
-DEF_SVE_TYPES_ARRAY (hsd_data);
-DEF_SVE_TYPES_ARRAY (s_float);
-DEF_SVE_TYPES_ARRAY (s_float_hsd_integer);
-DEF_SVE_TYPES_ARRAY (s_float_mf8);
-DEF_SVE_TYPES_ARRAY (s_float_sd_integer);
-DEF_SVE_TYPES_ARRAY (s_signed);
-DEF_SVE_TYPES_ARRAY (s_unsigned);
-DEF_SVE_TYPES_ARRAY (s_integer);
-DEF_SVE_TYPES_ARRAY (s_data);
-DEF_SVE_TYPES_ARRAY (sd_signed);
-DEF_SVE_TYPES_ARRAY (sd_unsigned);
-DEF_SVE_TYPES_ARRAY (sd_integer);
-DEF_SVE_TYPES_ARRAY (sd_data);
-DEF_SVE_TYPES_ARRAY (all_float_and_sd_integer);
-DEF_SVE_TYPES_ARRAY (d_float);
-DEF_SVE_TYPES_ARRAY (d_unsigned);
-DEF_SVE_TYPES_ARRAY (d_integer);
-DEF_SVE_TYPES_ARRAY (d_data);
-DEF_SVE_TYPES_ARRAY (cvt);
-DEF_SVE_TYPES_ARRAY (cvt_bfloat);
-DEF_SVE_TYPES_ARRAY (cvt_h_s_float);
-DEF_SVE_TYPES_ARRAY (cvt_f32_f16);
-DEF_SVE_TYPES_ARRAY (cvt_long);
-DEF_SVE_TYPES_ARRAY (cvt_mf8);
-DEF_SVE_TYPES_ARRAY (cvt_narrow_s);
-DEF_SVE_TYPES_ARRAY (cvt_narrow);
-DEF_SVE_TYPES_ARRAY (cvt_s_s);
-DEF_SVE_TYPES_ARRAY (cvtn_mf8);
-DEF_SVE_TYPES_ARRAY (cvtnx_mf8);
-DEF_SVE_TYPES_ARRAY (inc_dec_n);
-DEF_SVE_TYPES_ARRAY (qcvt_x2);
-DEF_SVE_TYPES_ARRAY (qcvt_x4);
-DEF_SVE_TYPES_ARRAY (qrshr_x2);
-DEF_SVE_TYPES_ARRAY (qrshr_x4);
-DEF_SVE_TYPES_ARRAY (qrshru_x2);
-DEF_SVE_TYPES_ARRAY (qrshru_x4);
-DEF_SVE_TYPES_ARRAY (reinterpret);
-DEF_SVE_TYPES_ARRAY (reinterpret_b);
-DEF_SVE_TYPES_ARRAY (while);
-DEF_SVE_TYPES_ARRAY (while_x);
-DEF_SVE_TYPES_ARRAY (while_x_c);
-DEF_SVE_TYPES_ARRAY (s_narrow_fsu);
-DEF_SVE_TYPES_ARRAY (all_za);
-DEF_SVE_TYPES_ARRAY (d_za);
-DEF_SVE_TYPES_ARRAY (za_bhsd_data);
-DEF_SVE_TYPES_ARRAY (za_all_data);
-DEF_SVE_TYPES_ARRAY (za_h_mf8);
-DEF_SVE_TYPES_ARRAY (za_h_bfloat);
-DEF_SVE_TYPES_ARRAY (za_h_float);
-DEF_SVE_TYPES_ARRAY (za_s_b_signed);
-DEF_SVE_TYPES_ARRAY (za_s_b_unsigned);
-DEF_SVE_TYPES_ARRAY (za_s_b_integer);
-DEF_SVE_TYPES_ARRAY (za_s_h_integer);
-DEF_SVE_TYPES_ARRAY (za_s_h_data);
-DEF_SVE_TYPES_ARRAY (za_s_unsigned);
-DEF_SVE_TYPES_ARRAY (za_s_integer);
-DEF_SVE_TYPES_ARRAY (za_s_mf8);
-DEF_SVE_TYPES_ARRAY (za_hs_mf8);
-DEF_SVE_TYPES_ARRAY (za_s_float);
-DEF_SVE_TYPES_ARRAY (za_s_data);
-DEF_SVE_TYPES_ARRAY (za_d_h_integer);
-DEF_SVE_TYPES_ARRAY (za_d_float);
-DEF_SVE_TYPES_ARRAY (za_d_integer);
-DEF_SVE_TYPES_ARRAY (mop_base);
-DEF_SVE_TYPES_ARRAY (mop_base_signed);
-DEF_SVE_TYPES_ARRAY (mop_base_unsigned);
-DEF_SVE_TYPES_ARRAY (mop_i16i64);
-DEF_SVE_TYPES_ARRAY (mop_i16i64_signed);
-DEF_SVE_TYPES_ARRAY (mop_i16i64_unsigned);
-DEF_SVE_TYPES_ARRAY (za);
-
-static const group_suffix_index groups_none[] = {
-  GROUP_none, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_x2[] = { GROUP_x2, NUM_GROUP_SUFFIXES };
-
-static const group_suffix_index groups_x12[] = {
-  GROUP_none, GROUP_x2, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_x4[] = { GROUP_x4, NUM_GROUP_SUFFIXES };
-
-static const group_suffix_index groups_x24[] = {
-  GROUP_x2, GROUP_x4, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_x124[] = {
-  GROUP_none, GROUP_x2, GROUP_x4, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_x1234[] = {
-  GROUP_none, GROUP_x2, GROUP_x3, GROUP_x4, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_vg1x2[] = {
-  GROUP_vg1x2, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_vg1x4[] = {
-  GROUP_vg1x4, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_vg1x24[] = {
-  GROUP_vg1x2, GROUP_vg1x4, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_vg2[] = {
-  GROUP_vg2x1, GROUP_vg2x2, GROUP_vg2x4, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_vg4[] = {
-  GROUP_vg4x1, GROUP_vg4x2, GROUP_vg4x4, NUM_GROUP_SUFFIXES
-};
-
-static const group_suffix_index groups_vg24[] = {
-  GROUP_vg2, GROUP_vg4, NUM_GROUP_SUFFIXES
-};
-
-/* Used by functions that have no governing predicate.  */
-static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
-
-/* Used by functions that have a governing predicate but do not have an
-   explicit suffix.  */
-static const predication_index preds_implicit[] = { PRED_implicit, NUM_PREDS };
-
-/* Used by functions that only support "_m" predication.  */
-static const predication_index preds_m[] = { PRED_m, NUM_PREDS };
-
-/* Used by functions that allow merging and "don't care" predication,
-   but are not suitable for predicated MOVPRFX.  */
-static const predication_index preds_mx[] = {
-  PRED_m, PRED_x, NUM_PREDS
-};
-
-/* Used by functions that allow merging, zeroing and "don't care"
-   predication.  */
-static const predication_index preds_mxz[] = {
-  PRED_m, PRED_x, PRED_z, NUM_PREDS
-};
-
-/* Used by functions that have the mxz predicated forms above, and in addition
-   have an unpredicated form.  */
-static const predication_index preds_mxz_or_none[] = {
-  PRED_m, PRED_x, PRED_z, PRED_none, NUM_PREDS
-};
-
-/* Used by functions that allow merging and zeroing predication but have
-   no "_x" form.  */
-static const predication_index preds_mz[] = { PRED_m, PRED_z, NUM_PREDS };
-
-/* Used by functions that have an unpredicated form and a _z predicated
-   form.  */
-static const predication_index preds_z_or_none[] = {
-  PRED_z, PRED_none, NUM_PREDS
-};
-
-/* Used by (mostly predicate) functions that only support "_z" predication.  */
-static const predication_index preds_z[] = { PRED_z, NUM_PREDS };
-
-/* Used by SME instructions that always merge into ZA.  */
-static const predication_index preds_za_m[] = { PRED_za_m, NUM_PREDS };
-
-#define NONSTREAMING_SVE(X) nonstreaming_only (AARCH64_FL_SVE | (X))
-#define SVE_AND_SME(X, Y) streaming_compatible (AARCH64_FL_SVE | (X), (Y))
-#define SSVE(X) SVE_AND_SME (X, X)
-
 /* A list of all arm_sve.h functions.  */
 static CONSTEXPR const function_group_info function_groups[] = {
 #define DEF_SVE_FUNCTION_GS_FPM(NAME, SHAPE, TYPES, GROUPS, PREDS, FPM_MODE) \
@@ -1346,6 +539,9 @@ function_builder::function_builder (handle_pragma_index 
pragma_index,
   m_function_nulls = function_nulls;
 
   gcc_obstack_init (&m_string_obstack);
+
+  if (!function_table)
+    function_table = hash_table<registered_function_hasher>::create_ggc (1023);
 }
 
 function_builder::~function_builder ()
@@ -3859,9 +3055,9 @@ gimple_folder::fold_to_stmt_vops (gimple *g)
 gimple *
 gimple_folder::fold ()
 {
-  /* Don't fold anything when SVE is disabled; emit an error during
+  /* Don't fold anything when NEON/SVE are disabled; emit an error during
      expansion instead.  */
-  if (!TARGET_SVE)
+  if (!TARGET_SIMD && !TARGET_SVE)
     return NULL;
 
   /* Punt if the function has a return type and no result location is
@@ -4772,6 +3968,7 @@ init_builtins ()
   register_builtin_types ();
   if (in_lto_p)
     {
+      aarch64_acle::handle_arm_neon_h (false);
       handle_arm_sve_h (false);
       handle_arm_sme_h (false);
       handle_arm_neon_sve_bridge_h (false);
@@ -4874,13 +4071,27 @@ register_svprfop ()
                                                      "svprfop", &values);
 }
 
+static bool arm_sve_h_handled = false;
+static location_t arm_sve_h_location;
+
+static bool arm_sme_h_handled = false;
+static location_t arm_sme_h_location;
+
+static bool arm_neon_sve_bridge_h_handled = false;
+static location_t arm_neon_sve_bridge_h_location;
+
 /* Implement #pragma GCC aarch64 "arm_sve.h".  */
 void
 handle_arm_sve_h (bool function_nulls_p)
 {
-  if (function_table)
+  if (!aarch64_acle::arm_neon_h_handled)
+    aarch64_acle::handle_arm_neon_h (false);
+
+  if (arm_sve_h_handled)
     {
       error ("duplicate definition of %qs", "arm_sve.h");
+      inform (arm_sve_h_location, "previous definition of %qs here",
+             "arm_sve.h");
       return;
     }
 
@@ -4903,25 +4114,38 @@ handle_arm_sve_h (bool function_nulls_p)
   register_svprfop ();
 
   /* Define the functions.  */
-  function_table = hash_table<registered_function_hasher>::create_ggc (1023);
   function_builder builder (arm_sve_handle, function_nulls_p);
   for (unsigned int i = 0; i < ARRAY_SIZE (function_groups); ++i)
     builder.register_function_group (function_groups[i]);
+
+  arm_sve_h_handled = true;
+  arm_sve_h_location = input_location;
 }
 
 /* Implement #pragma GCC aarch64 "arm_neon_sve_bridge.h".  */
 void
 handle_arm_neon_sve_bridge_h (bool function_nulls_p)
 {
-  if (initial_indexes[arm_sme_handle] == 0)
+  if (!arm_sme_h_handled)
     handle_arm_sme_h (true);
 
+  if (arm_neon_sve_bridge_h_handled)
+    {
+      error ("duplicate definition of %qs", "arm_neon_sve_bridge.h");
+      inform (arm_neon_sve_bridge_h_location, "previous definition of %qs 
here",
+             "arm_neon_sve_bridge.h");
+      return;
+    }
+
   aarch64_target_switcher switcher;
 
   /* Define the functions.  */
   function_builder builder (arm_neon_sve_handle, function_nulls_p);
   for (unsigned int i = 0; i < ARRAY_SIZE (neon_sve_function_groups); ++i)
     builder.register_function_group (neon_sve_function_groups[i]);
+
+  arm_neon_sve_bridge_h_handled = true;
+  arm_neon_sve_bridge_h_location = input_location;
 }
 
 /* Return the function decl with SVE function subcode CODE, or error_mark_node
@@ -4938,7 +4162,7 @@ builtin_decl (unsigned int code, bool)
 void
 handle_arm_sme_h (bool function_nulls_p)
 {
-  if (!function_table)
+  if (!arm_sve_h_handled)
     {
       error ("%qs defined without first defining %qs",
             "arm_sme.h", "arm_sve.h");
@@ -4950,6 +4174,9 @@ handle_arm_sme_h (bool function_nulls_p)
   function_builder builder (arm_sme_handle, function_nulls_p);
   for (unsigned int i = 0; i < ARRAY_SIZE (sme_function_groups); ++i)
     builder.register_function_group (sme_function_groups[i]);
+
+  arm_sme_h_handled = true;
+  arm_sme_h_location = input_location;
 }
 
 /* If we're implementing manual overloading, check whether the SVE
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.def 
b/gcc/config/aarch64/aarch64-sve-builtins.def
index 6ad257643b69..6df8d41d7f63 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins.def
@@ -26,6 +26,8 @@
 
    - aarch64-sve-builtins.def for common data types, groups, and other
        supporting definitions used across all files.
+   - aarch64-neon-builtins-base.def for any AdvSIMD intrinsic that is
+       enabled by the +simd extension.
    - aarch64-sve-builtins-base.def for the baseline SVE intrinsics which 
predate
        SVE2 and SME.
    - aarch64-sve-builtins-sve2.def for any scalable SIMD intrinsic that is
@@ -159,6 +161,15 @@ DEF_SVE_NEON_TYPE_SUFFIX (u32, svuint32_t, unsigned, 32, 
VNx4SImode,
 DEF_SVE_NEON_TYPE_SUFFIX (u64, svuint64_t, unsigned, 64, VNx2DImode,
                          Uint64x1_t, Uint64x2_t)
 
+DEF_SVE_NEON_TYPE_SUFFIX (p8, svuint8_t, poly, 8, VNx16QImode,
+                         Poly8x8_t, Poly8x16_t)
+DEF_SVE_NEON_TYPE_SUFFIX (p16, svuint16_t, poly, 16, VNx8HImode,
+                         Poly16x4_t, Poly16x8_t)
+DEF_SVE_NEON_TYPE_SUFFIX (p64, svuint64_t, poly, 64, VNx2DImode,
+                         Poly64x1_t, Poly64x2_t)
+DEF_SVE_NEON_TYPE_SUFFIX (p128, svuint64_t, poly, 128, TImode,
+                         Poly128_t, Poly128_t)
+
 /* Associate _za with bytes.  This is needed for svldr_vnum_za and
    svstr_vnum_za, whose ZA offset can be in the range [0, 15], as for za8.  */
 DEF_SME_ZA_SUFFIX (za, 8, VNx16QImode)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 82cf94b51739..b5acb0c9321e 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -193,147 +193,6 @@
     __vec;                                                             \
   })
 
-/* vadd  */
-__extension__ extern __inline int8x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_s8 (int8x8_t __a, int8x8_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline int16x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_s16 (int16x4_t __a, int16x4_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline int32x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_s32 (int32x2_t __a, int32x2_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline float32x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_f32 (float32x2_t __a, float32x2_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline float64x1_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_f64 (float64x1_t __a, float64x1_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint8x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_u8 (uint8x8_t __a, uint8x8_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint16x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_u16 (uint16x4_t __a, uint16x4_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint32x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_u32 (uint32x2_t __a, uint32x2_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline int64x1_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_s64 (int64x1_t __a, int64x1_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint64x1_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_u64 (uint64x1_t __a, uint64x1_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_s64 (int64x2_t __a, int64x2_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline float64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_f64 (float64x2_t __a, float64x2_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_u64 (uint64x2_t __a, uint64x2_t __b)
-{
-  return __a + __b;
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vaddl_s8 (int8x8_t __a, int8x8_t __b)
@@ -25904,20 +25763,6 @@ vsqrtq_f16 (float16x8_t __a)
 
 /* ARMv8.2-A FP16 two operands vector intrinsics.  */
 
-__extension__ extern __inline float16x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_f16 (float16x4_t __a, float16x4_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __a + __b;
-}
-
 __extension__ extern __inline float16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vabd_f16 (float16x4_t __a, float16x4_t __b)
@@ -28526,55 +28371,6 @@ vusmmlaq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t 
__b)
 
 #pragma GCC pop_options
 
-__extension__ extern __inline poly8x8_t
-__attribute ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_p8 (poly8x8_t __a, poly8x8_t __b)
-{
-  return __a ^ __b;
-}
-
-__extension__ extern __inline poly16x4_t
-__attribute ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_p16 (poly16x4_t __a, poly16x4_t __b)
-{
-  return __a ^ __b;
-}
-
-__extension__ extern __inline poly64x1_t
-__attribute ((__always_inline__, __gnu_inline__, __artificial__))
-vadd_p64 (poly64x1_t __a, poly64x1_t __b)
-{
-  return __a ^ __b;
-}
-
-__extension__ extern __inline poly8x16_t
-__attribute ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_p8 (poly8x16_t __a, poly8x16_t __b)
-{
-  return __a ^ __b;
-}
-
-__extension__ extern __inline poly16x8_t
-__attribute ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_p16 (poly16x8_t __a, poly16x8_t __b)
-{
-  return __a ^__b;
-}
-
-__extension__ extern __inline poly64x2_t
-__attribute ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_p64 (poly64x2_t __a, poly64x2_t __b)
-{
-  return __a ^ __b;
-}
-
-__extension__ extern __inline poly128_t
-__attribute ((__always_inline__, __gnu_inline__, __artificial__))
-vaddq_p128 (poly128_t __a, poly128_t __b)
-{
-  return __a ^ __b;
-}
-
 #undef __aarch64_vget_lane_any
 
 #undef __aarch64_vdup_lane_any
diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 1171d2023490..66b302192ada 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -67,6 +67,52 @@ aarch64-builtins.o: 
$(srcdir)/config/aarch64/aarch64-builtins.cc $(CONFIG_H) \
        $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
                $(srcdir)/config/aarch64/aarch64-builtins.cc
 
+aarch64-neon-builtins.o: \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins.cc \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins.def \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-base.def \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
+  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) $(DIAGNOSTIC_H) \
+  $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \
+  gimple-iterator.h gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \
+  stor-layout.h alias.h gimple-fold.h langhooks.h \
+  stringpool.h \
+  $(srcdir)/config/aarch64/aarch64-acle-builtins.h \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins.h \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-shapes.h \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-base.h
+       $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+               $(srcdir)/config/aarch64/aarch64-neon-builtins.cc
+
+aarch64-neon-builtins-shapes.o: \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-shapes.cc \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins.def \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-base.def \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
+  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) \
+  $(srcdir)/config/aarch64/aarch64-acle-builtins.h \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins.h \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-shapes.h
+       $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+               $(srcdir)/config/aarch64/aarch64-neon-builtins-shapes.cc
+
+aarch64-neon-builtins-base.o: \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-base.cc \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins.def \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-base.def \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
+  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \
+  $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \
+  gimple-iterator.h gimplify.h explow.h $(EMIT_RTL_H) tree-vector-builder.h \
+  rtx-vector-builder.h vec-perm-indices.h \
+  $(srcdir)/config/aarch64/aarch64-acle-builtins.h \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins.h \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-shapes.h \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-base.h \
+  $(srcdir)/config/aarch64/aarch64-neon-builtins-functions.h
+       $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+               $(srcdir)/config/aarch64/aarch64-neon-builtins-base.cc
+
 aarch64-sve-builtins.o: $(srcdir)/config/aarch64/aarch64-sve-builtins.cc \
   $(srcdir)/config/aarch64/aarch64-sve-builtins.def \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-base.def \
diff --git a/gcc/testsuite/g++.target/aarch64/pr103147-6.C 
b/gcc/testsuite/g++.target/aarch64/pr103147-6.C
index 15a606f976c8..bbea67b9b7db 100644
--- a/gcc/testsuite/g++.target/aarch64/pr103147-6.C
+++ b/gcc/testsuite/g++.target/aarch64/pr103147-6.C
@@ -1,3 +1,4 @@
 /* { dg-options "-mgeneral-regs-only" } */
+/* { dg-excess-errors "arm_neon.h" } */
 
 #include <arm_neon.h>
diff --git a/gcc/testsuite/g++.target/aarch64/pr117048.C 
b/gcc/testsuite/g++.target/aarch64/pr117048.C
index ae46e5875e4c..a9775700c5bf 100644
--- a/gcc/testsuite/g++.target/aarch64/pr117048.C
+++ b/gcc/testsuite/g++.target/aarch64/pr117048.C
@@ -30,5 +30,5 @@ void G(
     v[12] = vgetq_lane_s64(vd01, 0);
 }
 
-/* { dg-final { scan-assembler {\txar\tv[0-9]+\.2d, v[0-9]+\.2d, v[0-9]+\.2d, 
32\n} } } */
+/* { dg-final { scan-assembler {\txar\tv[0-9]+\.2d, v[0-9]+\.2d, v[0-9]+\.2d, 
#?32\n} } } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/neon/aarch64-neon.exp 
b/gcc/testsuite/gcc.target/aarch64/neon/aarch64-neon.exp
new file mode 100644
index 000000000000..03c4467e5354
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon/aarch64-neon.exp
@@ -0,0 +1,39 @@
+#  Specific regression driver for AArch64 NEON.
+#  Copyright (C) 2026-2026 Free Software Foundation, Inc.
+#  Contributed by ARM Ltd.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  <http://www.gnu.org/licenses/>.  */
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*\[cCs\]]] \
+       " -ansi -pedantic-errors -std=c23 -O3 -march=armv8-a+simd" ""
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gcc.target/aarch64/neon/arm_neon_test.h 
b/gcc/testsuite/gcc.target/aarch64/neon/arm_neon_test.h
new file mode 100644
index 000000000000..7d9371e01047
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon/arm_neon_test.h
@@ -0,0 +1,22 @@
+#include "arm_neon.h"
+
+#pragma GCC target "+simd+fp16+bf16+sha3"
+
+#define TEST_UNARY(NAME, RET_TYPE, ARG_1_TYPE)                                 
\
+  RET_TYPE test_##NAME (ARG_1_TYPE a) { return NAME (a); }
+
+#define TEST_UNIFORM_UNARY(NAME, TYPE) TEST_UNARY (NAME, TYPE, TYPE)
+
+#define TEST_BINARY(NAME, RET_TYPE, ARG_1_TYPE, ARG_2_TYPE)                    
\
+  RET_TYPE test_##NAME (ARG_1_TYPE a, ARG_2_TYPE b) { return NAME (a, b); }
+
+#define TEST_UNIFORM_BINARY(NAME, TYPE) TEST_BINARY (NAME, TYPE, TYPE, TYPE)
+
+#define TEST_TERNARY(NAME, RET_TYPE, ARG_1_TYPE, ARG_2_TYPE, ARG_3_TYPE)       
\
+  RET_TYPE test_##NAME (ARG_1_TYPE a, ARG_2_TYPE b, ARG_3_TYPE c)              
\
+  {                                                                            
\
+    return NAME (a, b, c);                                                     
\
+  }
+
+#define TEST_UNIFORM_TERNARY(NAME, TYPE)                                       
\
+  TEST_TERNARY (NAME, TYPE, TYPE, TYPE, TYPE)
diff --git a/gcc/testsuite/gcc.target/aarch64/neon/vadd.c 
b/gcc/testsuite/gcc.target/aarch64/neon/vadd.c
new file mode 100644
index 000000000000..e622718685db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon/vadd.c
@@ -0,0 +1,203 @@
+/* { dg-do compile } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "arm_neon_test.h"
+
+/*
+** test_vadd_u8:
+** add v0\.8b, (v0\.8b, v1\.8b|v1\.8b, v0\.8b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_u8, uint8x8_t)
+
+/*
+** test_vadd_s8:
+** add v0\.8b, (v0\.8b, v1\.8b|v1\.8b, v0\.8b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_s8, int8x8_t)
+
+/*
+** test_vadd_p8:
+** eor v0\.8b, (v0\.8b, v1\.8b|v1\.8b, v0\.8b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_p8, poly8x8_t)
+
+/*
+** test_vadd_u16:
+** add v0\.4h, (v0\.4h, v1\.4h|v1\.4h, v0\.4h)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_u16, uint16x4_t)
+
+/*
+** test_vadd_s16:
+** add v0\.4h, (v0\.4h, v1\.4h|v1\.4h, v0\.4h)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_s16, int16x4_t)
+
+/*
+** test_vadd_p16:
+** eor v0\.8b, (v0\.8b, v1\.8b|v1\.8b, v0\.8b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_p16, poly16x4_t)
+
+/*
+** test_vadd_u32:
+** add v0\.2s, (v0\.2s, v1\.2s|v1\.2s, v0\.2s)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_u32, uint32x2_t)
+
+/*
+** test_vadd_s32:
+** add v0\.2s, (v0\.2s, v1\.2s|v1\.2s, v0\.2s)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_s32, int32x2_t)
+
+/*
+** test_vadd_u64:
+** add d0, (d0, d1|d1, d0)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_u64, uint64x1_t)
+
+/*
+** test_vadd_s64:
+** add d0, (d0, d1|d1, d0)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_s64, int64x1_t)
+
+/*
+** test_vadd_p64:
+** eor v0\.8b, (v0\.8b, v1\.8b|v1\.8b, v0\.8b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vadd_p64, poly64x1_t)
+
+/*
+** test_vaddq_u8:
+** add v0\.16b, (v0\.16b, v1\.16b|v1\.16b, v0\.16b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_u8, uint8x16_t)
+
+/*
+** test_vaddq_s8:
+** add v0\.16b, (v0\.16b, v1\.16b|v1\.16b, v0\.16b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_s8, int8x16_t)
+
+/*
+** test_vaddq_p8:
+** eor v0\.16b, (v0\.16b, v1\.16b|v1\.16b, v0\.16b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_p8, poly8x16_t)
+
+/*
+** test_vaddq_u16:
+** add v0\.8h, (v0\.8h, v1\.8h|v1\.8h, v0\.8h)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_u16, uint16x8_t)
+
+/*
+** test_vaddq_s16:
+** add v0\.8h, (v0\.8h, v1\.8h|v1\.8h, v0\.8h)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_s16, int16x8_t)
+
+/*
+** test_vaddq_f16:
+** fadd        v0\.8h, (v0\.8h, v1\.8h|v1\.8h, v0\.8h)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_f16, float16x8_t)
+
+/*
+** test_vaddq_p16:
+** eor v0\.16b, (v0\.16b, v1\.16b|v1\.16b, v0\.16b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_p16, poly16x8_t)
+
+/*
+** test_vaddq_u32:
+** add v0\.4s, (v0\.4s, v1\.4s|v1\.4s, v0\.4s)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_u32, uint32x4_t)
+
+/*
+** test_vaddq_s32:
+** add v0\.4s, (v0\.4s, v1\.4s|v1\.4s, v0\.4s)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_s32, int32x4_t)
+
+/*
+** test_vaddq_f32:
+** fadd        v0\.4s, (v0\.4s, v1\.4s|v1\.4s, v0\.4s)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_f32, float32x4_t)
+
+/*
+** test_vaddq_u64:
+** add v0\.2d, (v0\.2d, v1\.2d|v1\.2d, v0\.2d)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_u64, uint64x2_t)
+
+/*
+** test_vaddq_s64:
+** add v0\.2d, (v0\.2d, v1\.2d|v1\.2d, v0\.2d)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_s64, int64x2_t)
+
+/*
+** test_vaddq_f64:
+** fadd        v0\.2d, (v0\.2d, v1\.2d|v1\.2d, v0\.2d)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_f64, float64x2_t)
+
+/*
+** test_vaddq_p64:
+** eor v0\.16b, (v0\.16b, v1\.16b|v1\.16b, v0\.16b)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_p64, poly64x2_t)
+
+/* `poly128_t` is a scalar type, like `__uint128_t`, so it is passed in two GPR
+    registers.  *
+/*
+** test_vaddq_p128:
+** eor x[0-9], x[0-9]+, x[0-9]+
+** eor x[0-9], x[0-9]+, x[0-9]+
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddq_p128, poly128_t)
+
+/*
+** test_vaddd_u64:
+** add x0, (x0, x1|x1, x0)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddd_u64, uint64_t)
+
+/*
+** test_vaddd_s64:
+** add x0, (x0, x1|x1, x0)
+** ret
+*/
+TEST_UNIFORM_BINARY (vaddd_s64, int64_t)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr103147-6.c 
b/gcc/testsuite/gcc.target/aarch64/pr103147-6.c
index 15a606f976c8..bbea67b9b7db 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr103147-6.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr103147-6.c
@@ -1,3 +1,4 @@
 /* { dg-options "-mgeneral-regs-only" } */
+/* { dg-excess-errors "arm_neon.h" } */
 
 #include <arm_neon.h>
-- 
2.54.0

[PATCH v2 3/7] aarch64: Port NEON add intrinsics to pragma-based framework

Reply via email to