> -----Original Message-----
> From: Tamar Christina <[email protected]>
> Sent: Thursday, February 5, 2026 7:52 AM
> To: xiezhiheng <[email protected]>; [email protected]
> Cc: Weiwei (weiwei, Compiler) <[email protected]>; liyunfei (E)
> <[email protected]>; Richard Earnshaw
> <[email protected]>; Richard Sandiford <[email protected]>;
> Kyrylo Tkachov <[email protected]>
> Subject: RE: [PATCH] aarch64: Add support for Hisilicon's hip12 core
> (-mcpu=hip12)
>
> > -----Original Message-----
> > From: xiezhiheng <[email protected]>
> > Sent: 03 February 2026 07:17
> > To: Tamar Christina <[email protected]>; [email protected]
> > Cc: Weiwei (weiwei, Compiler) <[email protected]>; liyunfei (E)
> > <[email protected]>; Richard Earnshaw <[email protected]>;
> > Richard Sandiford <[email protected]>; Kyrylo Tkachov
> > <[email protected]>
> > Subject: RE: [PATCH] aarch64: Add support for Hisilicon's hip12 core (-
> > mcpu=hip12)
> >
> > > -----Original Message-----
> > > From: Tamar Christina <[email protected]>
> > > Sent: Monday, February 2, 2026 8:07 PM
> > > To: xiezhiheng <[email protected]>; [email protected]
> > > Cc: Weiwei (weiwei, Compiler) <[email protected]>; liyunfei (E)
> > <[email protected]>; Richard Earnshaw
> > > <[email protected]>; Richard Sandiford
> > <[email protected]>; Kyrylo Tkachov <[email protected]>
> > > Subject: RE: [PATCH] aarch64: Add support for Hisilicon's hip12 core (-
> > mcpu=hip12)
> > >
> > > > -----Original Message-----
> > > > From: Tamar Christina
> > > > Sent: 02 February 2026 11:29
> > > > To: xiezhiheng <[email protected]>; [email protected]
> > > > Cc: Weiwei (weiwei, Compiler) <[email protected]>; liyunfei (E)
> > > > <[email protected]>; Richard Earnshaw
> > <[email protected]>;
> > > > Richard Sandiford <[email protected]>; Kyrylo Tkachov
> > > > <[email protected]>
> > > > Subject: RE: [PATCH] aarch64: Add support for Hisilicon's hip12 core (-
> > > > mcpu=hip12)
> > > >
> > > > > -----Original Message-----
> > > > > From: xiezhiheng <[email protected]>
> > > > > Sent: 31 January 2026 07:57
> > > > > To: [email protected]
> > > > > Cc: Weiwei (weiwei, Compiler) <[email protected]>; liyunfei (E)
> > > > > <[email protected]>; Richard Earnshaw
> > > > <[email protected]>;
> > > > > Richard Sandiford <[email protected]>; Tamar Christina
> > > > > <[email protected]>; Kyrylo Tkachov <[email protected]>
> > > > > Subject: [PATCH] aarch64: Add support for Hisilicon's hip12 core (-
> > > > > mcpu=hip12)
> > > > >
> > > > > This patch adds initial support for Hisilicon's hip12 core
> > > > > (Kunpeng 950 processor).
> > > > > For more information, see:
> > > > > https://www.huawei.com/en/news/2025/9/hc-xu-keynote-speech
> > > > >
> > > > > Bootstrapped and tested on aarch64-linux-gnu, no regression.
> > > > >
> > > > > OK for trunk?
> > > > > And I wonder if it's OK to backport to GCC 13/14/15?
> > > > >
> > > > > Signed-off-by: xiezhiheng <[email protected]>
> > > > > Co-authored-by: liyunfei <[email protected]>
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add hip12
> > > > > core
> > > > > * config/aarch64/aarch64-cost-tables.h: Add hip12_extra_costs
> > > > > * config/aarch64/aarch64-tune.md: Regenerate
> > > > > * config/aarch64/aarch64.cc: Include hip12 tuning model
> > > > > * doc/invoke.texi: Document -mcpu=hip12
> > > > > * config/aarch64/tuning_models/hip12.h: New file.
> > > > > ---
> > > > > gcc/config/aarch64/aarch64-cores.def | 1 +
> > > > > gcc/config/aarch64/aarch64-cost-tables.h | 107 +++++++++++
> > > > > gcc/config/aarch64/aarch64-tune.md | 2 +-
> > > > > gcc/config/aarch64/aarch64.cc | 1 +
> > > > > gcc/config/aarch64/tuning_models/hip12.h | 228
> > > > > +++++++++++++++++++++++
> > > > > gcc/doc/invoke.texi | 2 +-
> > > > > 6 files changed, 339 insertions(+), 2 deletions(-)
> > > > > create mode 100644 gcc/config/aarch64/tuning_models/hip12.h
> > > > >
> > > > > diff --git a/gcc/config/aarch64/aarch64-cores.def
> > > > > b/gcc/config/aarch64/aarch64-cores.def
> > > > > index 31c4b493230..709eca7d5c6 100644
> > > > > --- a/gcc/config/aarch64/aarch64-cores.def
> > > > > +++ b/gcc/config/aarch64/aarch64-cores.def
> > > > > @@ -138,6 +138,7 @@ AARCH64_CORE("fujitsu-monaka",
> > > > fujitsu_monaka,
> > > > > cortexa57, V9_3A, (F16, FAMINMAX,
> > > > >
> > > > > /* HiSilicon ('H') cores. */
> > > > > AARCH64_CORE("tsv110", tsv110, tsv110, V8_2A, (CRYPTO, F16),
> > > > tsv110,
> > > > > 0x48, 0xd01, -1)
> > > > > +AARCH64_CORE("hip12", hip12, cortexa57, V8_7A, (F16, PROFILE,
> > RNG,
> > > > > SVE2_BITPERM, SVE2_AES, SVE2_SM4, SVE2_SHA3, LS64, RCPC3),
> > hip12,
> > > > > 0x48, 0xd06, -1)
> > > > >
> > > >
> > > > We try to keep these in alphabetical order within a vendor group. So put
> > the
> > > > New one first.
> >
> > Updated in patch v2.
> > Bootstrapped and retested, no regression.
> >
>
> Thanks!
>
> Patch is OK for master, GCC 15 and 14.
Thanks again for your review. And Could you please help push the patch since
I do not have permission?
Patches for GCC 14 and 15 are in test now and will send later.
Thanks,
Zhiheng Xie
>
> Thanks,
> Tamar
>
> > Thanks,
> > Zhiheng Xie
> >
> > > >
> > > > OK with that change for trunk and GCC 15 and 14.
> > > >
> > > > You'll have to likely adjust the patch a bit for the older branches.
> > > > Please regression test each before backporting.
> > > >
> > > > I'm hesitant about GCC 13 though because we've had issues with
> > backporting
> > > > such a change to a branch that's about to get its last release since we
> > > > can't
> > > > fix it. Is GCC 13 support critical?
> >
> > GCC 13 support is not critical, support on GCC 14 and later is okay for me.
> > And I will adapt the patch for GCC 14/15 branches and test soon.
> >
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > > /* ARMv8.3-A Architecture Processors. */
> > > > >
> > > > > diff --git a/gcc/config/aarch64/aarch64-cost-tables.h
> > > > > b/gcc/config/aarch64/aarch64-cost-tables.h
> > > > > index fdb50c06ced..f6b7ba9db69 100644
> > > > > --- a/gcc/config/aarch64/aarch64-cost-tables.h
> > > > > +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> > > > > @@ -561,6 +561,113 @@ const struct cpu_cost_table
> > tsv110_extra_costs =
> > > > > }
> > > > > };
> > > > >
> > > > > +const struct cpu_cost_table hip12_extra_costs =
> > > > > +{
> > > > > + /* ALU */
> > > > > + {
> > > > > + 0, /* arith. */
> > > > > + 0, /* logical. */
> > > > > + 0, /* shift. */
> > > > > + 0, /* shift_reg. */
> > > > > + COSTS_N_INSNS (1), /* arith_shift. */
> > > > > + COSTS_N_INSNS (1), /* arith_shift_reg. */
> > > > > + 0, /* log_shift. */
> > > > > + 0, /* log_shift_reg. */
> > > > > + 0, /* extend. */
> > > > > + COSTS_N_INSNS (1), /* extend_arith. */
> > > > > + 0, /* bfi. */
> > > > > + 0, /* bfx. */
> > > > > + 0, /* clz. */
> > > > > + 0, /* rev. */
> > > > > + 0, /* non_exec. */
> > > > > + true /* non_exec_costs_exec. */
> > > > > + },
> > > > > + {
> > > > > + /* MULT SImode */
> > > > > + {
> > > > > + COSTS_N_INSNS (1), /* simple. */
> > > > > + COSTS_N_INSNS (1), /* flag_setting. */
> > > > > + COSTS_N_INSNS (1), /* extend. */
> > > > > + COSTS_N_INSNS (2), /* add. */
> > > > > + COSTS_N_INSNS (2), /* extend_add. */
> > > > > + COSTS_N_INSNS (5) /* idiv. */
> > > > > + },
> > > > > + /* MULT DImode */
> > > > > + {
> > > > > + COSTS_N_INSNS (2), /* simple. */
> > > > > + 0, /* flag_setting (N/A). */
> > > > > + COSTS_N_INSNS (2), /* extend. */
> > > > > + COSTS_N_INSNS (3), /* add. */
> > > > > + COSTS_N_INSNS (3), /* extend_add. */
> > > > > + COSTS_N_INSNS (7) /* idiv. */
> > > > > + }
> > > > > + },
> > > > > + /* LD/ST */
> > > > > + {
> > > > > + COSTS_N_INSNS (3), /* load. */
> > > > > + COSTS_N_INSNS (3), /* load_sign_extend. */
> > > > > + COSTS_N_INSNS (3), /* ldrd. */
> > > > > + 0, /* ldm_1st. */
> > > > > + 0, /* ldm_regs_per_insn_1st. */
> > > > > + 0, /* ldm_regs_per_insn_subsequent. */
> > > > > + COSTS_N_INSNS (5), /* loadf. */
> > > > > + COSTS_N_INSNS (5), /* loadd. */
> > > > > + COSTS_N_INSNS (4), /* load_unaligned. */
> > > > > + 0, /* store. */
> > > > > + 0, /* strd. */
> > > > > + 0, /* stm_1st. */
> > > > > + 0, /* stm_regs_per_insn_1st. */
> > > > > + 0, /* stm_regs_per_insn_subsequent. */
> > > > > + 0, /* storef. */
> > > > > + 0, /* stored. */
> > > > > + COSTS_N_INSNS (1), /* store_unaligned. */
> > > > > + COSTS_N_INSNS (5), /* loadv. */
> > > > > + COSTS_N_INSNS (1) /* storev. */
> > > > > + },
> > > > > + {
> > > > > + /* FP SFmode */
> > > > > + {
> > > > > + COSTS_N_INSNS (5), /* div. */
> > > > > + COSTS_N_INSNS (2), /* mult. */
> > > > > + COSTS_N_INSNS (4), /* mult_addsub. */
> > > > > + COSTS_N_INSNS (3), /* fma. */
> > > > > + COSTS_N_INSNS (1), /* addsub. */
> > > > > + COSTS_N_INSNS (1), /* fpconst. */
> > > > > + 0, /* neg. */
> > > > > + COSTS_N_INSNS (1), /* compare. */
> > > > > + COSTS_N_INSNS (2), /* widen. */
> > > > > + COSTS_N_INSNS (2), /* narrow. */
> > > > > + COSTS_N_INSNS (2), /* toint. */
> > > > > + COSTS_N_INSNS (3), /* fromint. */
> > > > > + COSTS_N_INSNS (2) /* roundint. */
> > > > > + },
> > > > > + /* FP DFmode */
> > > > > + {
> > > > > + COSTS_N_INSNS (7), /* div. */
> > > > > + COSTS_N_INSNS (2), /* mult. */
> > > > > + COSTS_N_INSNS (4), /* mult_addsub. */
> > > > > + COSTS_N_INSNS (3), /* fma. */
> > > > > + COSTS_N_INSNS (1), /* addsub. */
> > > > > + COSTS_N_INSNS (1), /* fpconst. */
> > > > > + 0, /* neg. */
> > > > > + COSTS_N_INSNS (1), /* compare. */
> > > > > + COSTS_N_INSNS (2), /* widen. */
> > > > > + COSTS_N_INSNS (2), /* narrow. */
> > > > > + COSTS_N_INSNS (2), /* toint. */
> > > > > + COSTS_N_INSNS (3), /* fromint. */
> > > > > + COSTS_N_INSNS (2) /* roundint. */
> > > > > + }
> > > > > + },
> > > > > + /* Vector */
> > > > > + {
> > > > > + COSTS_N_INSNS (1), /* alu. */
> > > > > + COSTS_N_INSNS (2), /* mult. */
> > > > > + COSTS_N_INSNS (1), /* movi. */
> > > > > + COSTS_N_INSNS (1), /* dup. */
> > > > > + COSTS_N_INSNS (1) /* extract. */
> > > > > + }
> > > > > +};
> > > > > +
> > > > > const struct cpu_cost_table a64fx_extra_costs =
> > > > > {
> > > > > /* ALU */
> > > > > diff --git a/gcc/config/aarch64/aarch64-tune.md
> > > > > b/gcc/config/aarch64/aarch64-tune.md
> > > > > index 803e0ffad8c..f519e337ec4 100644
> > > > > --- a/gcc/config/aarch64/aarch64-tune.md
> > > > > +++ b/gcc/config/aarch64/aarch64-tune.md
> >
> > And Regenerated.
> >
> > > > > @@ -1,5 +1,5 @@
> > > > > ;; -*- buffer-read-only: t -*-
> > > > > ;; Generated automatically by gentune.sh from aarch64-cores.def
> > > > > (define_attr "tune"
> > > > > -
> > > > >
> > > > > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> > > > >
> > > >
> > derx,thunderxt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunder
> > > > >
> > > >
> > xt81,thunderxt83,ampere1,ampere1a,ampere1b,ampere1c,emag,xgene1,falk
> > > > >
> > > >
> > or,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa5
> > > > >
> > > >
> > 5,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa
> > > > >
> > > >
> > 78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,o
> > > > >
> > > >
> > cteontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f
> > > > >
> > > >
> > 95n,octeontx2f95mm,a64fx,fujitsu_monaka,tsv110,thunderx3t110,neoverse
> > > > >
> > > >
> > v1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortex
> > > > >
> > > >
> > a53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76c
> > > > >
> > > >
> > ortexa55,cortexr82,cortexr82ae,applea12,applem1_0,applem1_1,applem1_2
> > > > >
> > > >
> > ,applem1_3,applem2_0,applem2_1,applem2_2,applem2_3,applem3_0,apple
> > > > >
> > > >
> > m3_1,applem3_2,applem4_0,applem4_1,applem4_2,cortexa510,cortexa520
> > > > >
> > > >
> > ,cortexa520ae,cortexa710,cortexa715,cortexa720,cortexa720ae,cortexa725,c
> > > > >
> > > >
> > ortexa320,cortexx2,cortexx3,cortexx4,cortexx925,neoversen2,cobalt100,neo
> > > > >
> > > >
> > versen3,neoversev2,grace,neoversev3,neoversev3ae,c1nano,c1pro,c1premiu
> > > > >
> > > >
> > m,c1ultra,demeter,olympus,gb10,generic,generic_armv8_a,generic_armv9_a"
> > > > > +
> > > > >
> > > > > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> > > > >
> > > >
> > derx,thunderxt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunder
> > > > >
> > > >
> > xt81,thunderxt83,ampere1,ampere1a,ampere1b,ampere1c,emag,xgene1,falk
> > > > >
> > > >
> > or,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa5
> > > > >
> > > >
> > 5,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa
> > > > >
> > > >
> > 78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,o
> > > > >
> > > >
> > cteontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f
> > > > >
> > > >
> > 95n,octeontx2f95mm,a64fx,fujitsu_monaka,tsv110,hip12,thunderx3t110,ne
> > > > >
> > > >
> > oversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa7
> > > > >
> > > >
> > 2cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cort
> > > > >
> > > >
> > exa76cortexa55,cortexr82,cortexr82ae,applea12,applem1_0,applem1_1,appl
> > > > >
> > > >
> > em1_2,applem1_3,applem2_0,applem2_1,applem2_2,applem2_3,applem3_
> > > > >
> > > >
> > 0,applem3_1,applem3_2,applem4_0,applem4_1,applem4_2,cortexa510,cort
> > > > >
> > > >
> > exa520,cortexa520ae,cortexa710,cortexa715,cortexa720,cortexa720ae,corte
> > > > >
> > > >
> > xa725,cortexa320,cortexx2,cortexx3,cortexx4,cortexx925,neoversen2,cobalt1
> > > > >
> > > >
> > 00,neoversen3,neoversev2,grace,neoversev3,neoversev3ae,c1nano,c1pro,c1
> > > > >
> > > >
> > premium,c1ultra,demeter,olympus,gb10,generic,generic_armv8_a,generic_ar
> > > > > mv9_a"
> > > > > (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> > > > > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc
> > > > > index 293afa52b3b..047a898803e 100644
> > > > > --- a/gcc/config/aarch64/aarch64.cc
> > > > > +++ b/gcc/config/aarch64/aarch64.cc
> > > > > @@ -422,6 +422,7 @@ static const struct aarch64_flag_desc
> > > > > aarch64_tuning_flags[] =
> > > > > #include "tuning_models/thunderxt88.h"
> > > > > #include "tuning_models/thunderx.h"
> > > > > #include "tuning_models/tsv110.h"
> > > > > +#include "tuning_models/hip12.h"
> > > > > #include "tuning_models/xgene1.h"
> > > > > #include "tuning_models/emag.h"
> > > > > #include "tuning_models/qdf24xx.h"
> > > > > diff --git a/gcc/config/aarch64/tuning_models/hip12.h
> > > > > b/gcc/config/aarch64/tuning_models/hip12.h
> > > > > new file mode 100644
> > > > > index 00000000000..e1262682772
> > > > > --- /dev/null
> > > > > +++ b/gcc/config/aarch64/tuning_models/hip12.h
> > > > > @@ -0,0 +1,228 @@
> > > > > +/* Tuning model description for AArch64 architecture.
> > > > > + Copyright (C) 2009-2026 Free Software Foundation, Inc.
> > > > > +
> > > > > + This file is part of GCC.
> > > > > +
> > > > > + GCC is free software; you can redistribute it and/or modify it
> > > > > + under the terms of the GNU General Public License as published by
> > > > > + the Free Software Foundation; either version 3, or (at your
> > > > > option)
> > > > > + any later version.
> > > > > +
> > > > > + GCC is distributed in the hope that it will be useful, but
> > > > > + WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > > > GNU
> > > > > + General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU General Public License
> > > > > + along with GCC; see the file COPYING3. If not see
> > > > > + <http://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#ifndef GCC_AARCH64_H_HIP12
> > > > > +#define GCC_AARCH64_H_HIP12
> > > > > +
> > > > > +#include "generic.h"
> > > > > +
> > > > > +static const struct cpu_addrcost_table hip12_addrcost_table =
> > > > > +{
> > > > > + {
> > > > > + 0, /* hi */
> > > > > + 0, /* si */
> > > > > + 0, /* di */
> > > > > + 2, /* ti */
> > > > > + },
> > > > > + 0, /* pre_modify */
> > > > > + 0, /* post_modify */
> > > > > + 2, /* post_modify_ld3_st3 */
> > > > > + 2, /* post_modify_ld4_st4 */
> > > > > + 0, /* register_offset */
> > > > > + 0, /* register_sextend */
> > > > > + 0, /* register_zextend */
> > > > > + 0, /* imm_offset */
> > > > > +};
> > > > > +
> > > > > +static const struct cpu_regmove_cost hip12_regmove_cost =
> > > > > +{
> > > > > + 1, /* GP2GP */
> > > > > + /* Spilling to int<->fp instead of memory is recommended so set
> > > > > + realistic costs compared to memmov_cost. */
> > > > > + 5, /* GP2FP */
> > > > > + 2, /* FP2GP */
> > > > > + 2 /* FP2FP */
> > > > > +};
> > > > > +
> > > > > +static const advsimd_vec_cost hip12_advsimd_vector_cost =
> > > > > +{
> > > > > + 2, /* int_stmt_cost */
> > > > > + 2, /* fp_stmt_cost */
> > > > > + 2, /* ld2_st2_permute_cost */
> > > > > + 2, /* ld3_st3_permute_cost */
> > > > > + 3, /* ld4_st4_permute_cost */
> > > > > + 2, /* permute_cost */
> > > > > + 9, /* reduc_i8_cost */
> > > > > + 7, /* reduc_i16_cost */
> > > > > + 5, /* reduc_i32_cost */
> > > > > + 3, /* reduc_i64_cost */
> > > > > + 9, /* reduc_f16_cost */
> > > > > + 6, /* reduc_f32_cost */
> > > > > + 3, /* reduc_f64_cost */
> > > > > + 2, /* store_elt_extra_cost */
> > > > > + 2, /* vec_to_scalar_cost */
> > > > > + 4, /* scalar_to_vec_cost */
> > > > > + 6, /* align_load_cost */
> > > > > + 6, /* unalign_load_cost */
> > > > > + 1, /* unalign_store_cost */
> > > > > + 1 /* store_cost */
> > > > > +};
> > > > > +
> > > > > +static const sve_vec_cost hip12_sve_vector_cost =
> > > > > +{
> > > > > + {
> > > > > + 2, /* int_stmt_cost */
> > > > > + 2, /* fp_stmt_cost */
> > > > > + 2, /* ld2_st2_permute_cost */
> > > > > + 3, /* ld3_st3_permute_cost */
> > > > > + 3, /* ld4_st4_permute_cost */
> > > > > + 2, /* permute_cost */
> > > > > + /* Theoretically, a reduction involving 31 scalar ADDs could
> > > > > + complete in ~6 cycles and would have a cost of 31. [SU]ADDV
> > > > > + completes in 13 cycles, so give it a cost of 31 + 7. */
> > > > > + 38, /* reduc_i8_cost */
> > > > > + /* Likewise for 15 scalar ADDs (~3 cycles) vs. 10: 15 + 7. */
> > > > > + 22, /* reduc_i16_cost */
> > > > > + /* Likewise for 7 scalar ADDs (~2 cycles) vs. 7: 7 + 5. */
> > > > > + 12, /* reduc_i32_cost */
> > > > > + /* Likewise for 3 scalar ADDs (~1 cycles) vs. 4: 3 + 3. */
> > > > > + 6, /* reduc_i64_cost */
> > > > > + /* Theoretically, a reduction involving 15 scalar FADDs could
> > > > > + complete in ~8 cycles and would have a cost of 30. FADDV
> > > > > + completes in 15 cycles, so give it a cost of 30 + 7. */
> > > > > + 37, /* reduc_f16_cost */
> > > > > + /* Likewise for 7 scalar FADDs (~4 cycles) vs. 12: 14 + 8. */
> > > > > + 22, /* reduc_f32_cost */
> > > > > + /* Likewise for 3 scalar FADDs (~2 cycles) vs. 9: 6 + 7. */
> > > > > + 13, /* reduc_f64_cost */
> > > > > + 2, /* store_elt_extra_cost */
> > > > > + 2, /* vec_to_scalar_cost */
> > > > > + 4, /* scalar_to_vec_cost */
> > > > > + 6, /* align_load_cost */
> > > > > + 6, /* unalign_load_cost */
> > > > > + 1, /* unalign_store_cost */
> > > > > + 1 /* store_cost */
> > > > > + },
> > > > > + 3, /* clast_cost */
> > > > > + 42, /* fadda_f16_cost */
> > > > > + 26, /* fadda_f32_cost */
> > > > > + 20, /* fadda_f64_cost */
> > > > > + 32, /* gather_load_x32_cost */
> > > > > + 16, /* gather_load_x64_cost */
> > > > > + 96, /* gather_load_x32_init_cost */
> > > > > + 32, /* gather_load_x64_init_cost */
> > > > > + 3 /* scatter_store_elt_cost */
> > > > > +};
> > > > > +
> > > > > +static const aarch64_scalar_vec_issue_info hip12_scalar_issue_info =
> > > > > +{
> > > > > + 5, /* loads_stores_per_cycle */
> > > > > + 2, /* stores_per_cycle */
> > > > > + 6, /* general_ops_per_cycle */
> > > > > + 0, /* fp_simd_load_general_ops */
> > > > > + 1 /* fp_simd_store_general_ops */
> > > > > +};
> > > > > +
> > > > > +static const aarch64_advsimd_vec_issue_info
> > hip12_advsimd_issue_info =
> > > > > +{
> > > > > + {
> > > > > + 5, /* loads_stores_per_cycle */
> > > > > + 2, /* stores_per_cycle */
> > > > > + 4, /* general_ops_per_cycle */
> > > > > + 0, /* fp_simd_load_general_ops */
> > > > > + 1 /* fp_simd_store_general_ops */
> > > > > + },
> > > > > + 2, /* ld2_st2_general_ops */
> > > > > + 2, /* ld3_st3_general_ops */
> > > > > + 3 /* ld4_st4_general_ops */
> > > > > +};
> > > > > +
> > > > > +static const aarch64_sve_vec_issue_info hip12_sve_issue_info =
> > > > > +{
> > > > > + {
> > > > > + {
> > > > > + 5, /* loads_per_cycle */
> > > > > + 2, /* stores_per_cycle */
> > > > > + 4, /* general_ops_per_cycle */
> > >
> > > One question though, you set
> > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> > > meaning you expect Adv. SIMD and SVE to have the same total bandwidth.
> > >
> > > But at VL 256. An issue rate of 4 general ops per cycle doesn't match up.
> > >
> > > Are these right? Or should you not have
> > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT?
> >
> > Oh, it should be 2 at SVE256 mode. Thanks a lot for pointing out this
> > mistake.
> > Updated in patch v2.
> >
> > >
> > > Thanks sorry for not spotting it before,
> > > Tamar
> > >
> > > > > + 0, /* fp_simd_load_general_ops */
> > > > > + 1 /* fp_simd_store_general_ops */
> > > > > + },
> > > > > + 2, /* ld2_st2_general_ops */
> > > > > + 2, /* ld3_st3_general_ops */
> > > > > + 3 /* ld4_st4_general_ops */
> > > > > + },
> > > > > + 4, /* pred_ops_per_cycle */
> >
> > Also changed to 2.
> >
> > > > > + 2, /* while_pred_ops */
> > > > > + 2, /* int_cmp_pred_ops */
> > > > > + 1, /* fp_cmp_pred_ops */
> > > > > + 1, /* gather_scatter_pair_general_ops */
> > > > > + 1 /* gather_scatter_pair_pred_ops */
> > > > > +};
> > > > > +
> > > > > +static const aarch64_vec_issue_info hip12_vec_issue_info =
> > > > > +{
> > > > > + &hip12_scalar_issue_info,
> > > > > + &hip12_advsimd_issue_info,
> > > > > + &hip12_sve_issue_info
> > > > > +};
> > > > > +
> > > > > +static const struct cpu_vector_cost hip12_vector_cost =
> > > > > +{
> > > > > + 1, /* scalar_int_stmt_cost */
> > > > > + 2, /* scalar_fp_stmt_cost */
> > > > > + 4, /* scalar_load_cost */
> > > > > + 1, /* scalar_store_cost */
> > > > > + 1, /* cond_taken_branch_cost */
> > > > > + 1, /* cond_not_taken_branch_cost */
> > > > > + &hip12_advsimd_vector_cost, /* advsimd */
> > > > > + &hip12_sve_vector_cost, /* sve */
> > > > > + &hip12_vec_issue_info /* issue_info */
> > > > > +};
> > > > > +
> > > > > +static const struct tune_params hip12_tunings =
> > > > > +{
> > > > > + &hip12_extra_costs,
> > > > > + &hip12_addrcost_table,
> > > > > + &hip12_regmove_cost,
> > > > > + &hip12_vector_cost,
> > > > > + &generic_branch_cost,
> > > > > + &generic_approx_modes,
> > > > > + SVE_256, /* sve_width */
> > > > > + { 4, /* load_int. */
> > > > > + 1, /* store_int. */
> > > > > + 6, /* load_fp. */
> > > > > + 1, /* store_fp. */
> > > > > + 8, /* load_pred. */
> > > > > + 4 /* store_pred. */
> > > > > + }, /* memmov_cost. */
> > > > > + 8, /* issue_rate */
> > > > > + (AARCH64_FUSE_BASE
> > > > > + | AARCH64_FUSE_CMP_CSEL
> > > > > + | AARCH64_FUSE_CMP_CSET), /* fusible_ops */
> > > > > + "16", /* function_align. */
> > > > > + "4", /* jump_align. */
> > > > > + "8", /* loop_align. */
> > > > > + 2, /* int_reassoc_width. */
> > > > > + 4, /* fp_reassoc_width. */
> > > > > + 2, /* fma_reassoc_width. */
> > > > > + 2, /* vec_reassoc_width. */
> > > > > + 2, /* min_div_recip_mul_sf. */
> > > > > + 2, /* min_div_recip_mul_df. */
> > > > > + 0, /* max_case_values. */
> > > > > + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.
> > */
> > > > > + (AARCH64_EXTRA_TUNE_BASE
> > > > > + | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> >
> > Oh, I forgot to remove this flag in my final patch, updated in patch v2.
> >
> > > > > + | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /*
> > > > > tune_flags. */
> > > > > + &generic_armv8_a_prefetch_tune,
> > > > > + AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */
> > > > > + AARCH64_LDP_STP_POLICY_ALWAYS, /* stp_policy_model. */
> > > > > + nullptr /* dispatch_constraints. */
> > > > > +};
> > > > > +
> > > > > +#endif /* GCC_AARCH64_H_HIP12. */
> > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > index 46d53705870..457cb4209be 100644
> > > > > --- a/gcc/doc/invoke.texi
> > > > > +++ b/gcc/doc/invoke.texi
> > > > > @@ -22855,7 +22855,7 @@ performance of the code. Permissible
> > values
> > > > for
> > > > > this option are:
> > > > > @samp{octeontx2f95mm},
> > > > > @samp{a64fx}, @samp{fujitsu-monaka},
> > > > > @samp{thunderx}, @samp{thunderxt88},
> > > > > -@samp{thunderxt88p1}, @samp{thunderxt81}, @samp{tsv110},
> > > > > +@samp{thunderxt88p1}, @samp{thunderxt81}, @samp{tsv110},
> > > > > @samp{hip12},
> > > > > @samp{thunderxt83}, @samp{thunderx2t99},
> > @samp{thunderx3t110},
> > > > > @samp{zeus},
> > > > > @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
> > > > > @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},
> > > > > --
> > > > > 2.19.1