Re: [PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2.

Kyrylo Tkachov Thu, 31 Jul 2025 07:47:34 -0700


> On 29 Jul 2025, at 17:14, Jennifer Schmitz <jschm...@nvidia.com> wrote:
> 
> This patch adds dispatch constraints for Neoverse V2 and illustrates the steps
> necessary to enable dispatch scheduling for an AArch64 core.
> 
> The dispatch constraints are based on section 4.1 of the Neoverse V2 SWOG.
> Please note that the values used here deviate slightly from the current SWOG
> version but are based on correct numbers. Arm will do an official Neoverse V2
> SWOG release with the updated values in due time.
> 
> Here are the steps how we implemented the dispatch constraints for
> Neoverse V2:
> 1. We used instruction attributes to group instructions into dispatch groups,
>   corresponding to operations that utilize a certain pipeline type. For that,
>   we added a new attribute (neoversev2_dispatch) with values for the
>   different dispatch groups. The values of neoversev2_dispatch are determined
>   using expressions of other instruction attributes.
>   For example, the SWOG describes a constraint of "Up to 4 uOPs utilizing the
>   M pipelines". Thus, one of the values of neoversev2_dispatch is "m" and it
>   groups instructions that use the M pipelines such as integer multiplication.
>   Note that we made some minor simplifications compared to the information
>   in the SWOG, because the instruction annotation does not allow for a fully
>   accurate mapping of instructions to utilized pipelines. To give one example,
>   the instructions IRG and LDG are both tagged with "memtag", but IRG uses
>   the M pipelines, while LDG uses the L pipelines.
> 2. In the Neoverse V2 tuning model, we added an array of dispatch_constraint
>   objects and referenced it in the tune_params. The new attribute
>   neoversev2_dispatch provided a compact way to define the dispatch
>   constraints.
> 3. We enabled dispatch scheduling for Neoverse V2 by adding the
>   AARCH64_EXTRA_TUNE_DISPATCH_SCHED tune flag.
> 
> Performance evaluation on Grace machine using SPEC2017 and GROMACS2024:
> We ran each benchmark 5 times compiled with trunk (commit a1fb757) and with
> the patch series and computed the speed-up for the median values per
> test (i.e. values >1 mean that the patch series improves performance):
> 
> SPEC2017 FP (-O3 -Wl,-z,muldefs -lm -fallow-argument-mismatch -fpermissive
>     -fstack-arrays -flto=auto -Wl,--sort-section=name -march=native
>     -mcpu=neoverse-v2 -std=gnu17):
> Geom. mean of speed-ups         1.0006
> blender                         1.0008
> bwaves                          0.9996
> cactuBSSN                       1.0007
> fotonik3d                       1.0002
> imagick                         0.9999
> lbm                             1.0016
> nab                             1.0012
> namd                            1.0002
> parest                          1.0004
> povray                          1.0029
> roms                            1.0000
> wrf                             1.0003
> 
> SPEC2017 INT (same as SPEC2017 FP):
> Geom. mean of speed-ups         0.9994
> deepsjeng                       0.9991
> gcc                             1.0024
> leela                           0.9985
> mcf                             0.9985
> exchange2                       1.0000
> omnetpp                         1.0005
> perlbench                       0.9975
> x264                            1.0032
> xalancbmk                       0.9916
> xz                              1.0032
> 
> GROMACS 2024 (-O3 -Wl,-z,muldefs -lm -flto=auto -Wl,--sort-section=name
>      -march=native -mcpu=neoverse-v2)
> Geom. mean of speed-ups:                     1.0024
> 22vs23_cut_arm_neon_asimd_cpu_perf           1.0005
> 22vs23_cut_arm_sve_cpu_perf                  1.0153
> 22vs23_fsw_arm_neon_asimd_cpu_perf           1.0107
> 22vs23_fsw_arm_sve_cpu_perf                  1.0156
> 22vs23_ljpme-geom_arm_neon_asimd_cpu_perf    1.0081
> 22vs23_ljpme-geom_arm_sve_cpu_perf           1.0024
> 22vs23_ljpme-lb_arm_neon_asimd_cpu_perf      1.0068
> 22vs23_ljpme-lb_arm_sve_cpu_perf             0.9957
> 22vs23_psh_arm_neon_asimd_cpu_perf           0.9957
> 22vs23_psh_arm_sve_cpu_perf                  0.9885
> 22vs23_psw_arm_neon_asimd_cpu_perf           0.9983
> 22vs23_psw_arm_sve_cpu_perf                  1.0024
> 22vs23_rf_arm_neon_asimd_cpu_perf            0.9976
> 22vs23_rf_arm_sve_cpu_perf                   0.9916
> 
> The effect of the patch series on compile times was evaluated by
> comparing the compile times of insn-emit-1.cc. Speed-up for the median
> values of 5 repetitions: 1.0001
> 
> Any help with further performance evaluation would be greatly appreciated.
> 
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.


My thoughts on this:
* From first principles it seems that scheduling for dispatch constraints is 
the sensible strategy for aggressive OoO CPUs.
Trying to fill in gaps created by high-latency instructions as per the 
traditional scheduling approach is not useful as the hardware should handle it 
automatically.
These CPUs are instead more sensitive to more frontend limitations like 
dispatch.
* The performance results here show that SPEC is not particularly sensitive to 
the scheduling approach. GROMACS looks a bit more interesting with some 
subtests getting up to 1.5% better.
GROMACS uses more explicit intrinsics-based vector code which is different to 
how SPEC is written. If someone has access to Neoverse V2 hardware and 
non-SPEC-shaped workloads it’d be very interesting to get more data points on 
the performance.
* The implementation of the relevant hooks and the CPU-specific is nicely 
isolated in the new .cc and neoversev2.md files so hopefully CPUs that won’t 
use this scheduling scheme shouldn’t need to care much about the code for it.

Thanks,
Kyrill


> 
> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64.md: Include neoversev2.md.
> * config/aarch64/tuning_models/neoversev2.h: Enable dispatch
> scheduling and add dispatch constraints.
> * config/aarch64/neoversev2.md: New file and new instruction attribute
> neoversev2_dispatch.
> ---
> gcc/config/aarch64/aarch64.md                 |   3 +
> gcc/config/aarch64/neoversev2.md              | 192 ++++++++++++++++++
> gcc/config/aarch64/tuning_models/neoversev2.h | 102 +++++++++-
> 3 files changed, 294 insertions(+), 3 deletions(-)
> create mode 100644 gcc/config/aarch64/neoversev2.md
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index fc9c819b864..bceaf40ae97 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -672,6 +672,9 @@
> (include "tsv110.md")
> (include "thunderx3t110.md")
> 
> +;; Dispatch scheduling
> +(include "neoversev2.md")
> +
> ;; -------------------------------------------------------------------
> ;; Jumps and other miscellaneous insns
> ;; -------------------------------------------------------------------
> diff --git a/gcc/config/aarch64/neoversev2.md 
> b/gcc/config/aarch64/neoversev2.md
> new file mode 100644
> index 00000000000..8dc9b098d09
> --- /dev/null
> +++ b/gcc/config/aarch64/neoversev2.md
> @@ -0,0 +1,192 @@
> +;; Instruction attribute for dispatch scheduling for Neoverse V2.
> +;; Copyright The GNU Toolchain Authors.
> +;;
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but
> +;; WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;; General Public License for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; <http://www.gnu.org/licenses/>.
> +
> +;; Attribute that groups other instruction attributes into dispatch groups
> +;; for Neoverse V2 cores.  Dispatch groups are groups of pipelines for which
> +;; the SWOG specifies a dispatch constraint.  For example: Because the SWOG
> +;; contains a dispatch constraint for the V02 pipelines, there is an 
> attribute
> +;; value "v02" that groups instructions that are processed by the V0 and V2
> +;; pipelines.
> +;; Values that contain a "_" represent combinations of dispatch groups.
> +;; For example, there are dispatch constraints for the M0 and V pipelines.
> +;; The value "m0_v" groups instructions that utilize the M0 as well as the
> +;; V pipelines, such that both dispatch constraints apply.
> +
> +(define_attr "neoversev2_dispatch"
> +  "none,bs01,bsm,m,m0,v02,v13,v,l01,l,bsm_l,m_l,m0_v,v_v13,v_l,\
> +   l01_d,l01_v"
> +  (cond [(eq_attr "type" "branch,call")
> + (const_string "bs01")
> + (ior
> +   (eq_attr "type" "adc_reg,alu_ext,alu_imm,alu_sreg,alus_ext,\
> +    alus_imm,alus_sreg,clz,csel,logic_imm,logic_reg,logics_imm,\
> +    logics_reg,mov_imm,rbit,rev,shift_reg")
> +   (eq_attr "sve_type" "sve_pred_cnt_scalar"))
> + (const_string "bsm")
> + (ior
> +   (eq_attr "type" "alu_ext,alus_ext,bfm,bfx,mul,rotate_imm,\
> +    smull,umull")
> +   (eq_attr "autodetect_type" "alu_shift_asr_op2,alu_shift_lsl_op2,\
> +    alu_shift_lsr_op2")
> +   (eq_attr "sve_type" "sve_pred_cnt_ctrl,sve_pred_misc"))
> + (const_string "m")
> + (ior
> +   (eq_attr "type" "crc,f_cvti2f,mla,neon_from_gp,neon_from_gp_q,\
> +    sdiv,smlal,udiv,umlal")
> +   (eq_attr "sve_type" "sve_ffr,sve_pred_logical"))
> + (const_string "m0")
> + (ior
> +   (eq_attr "type"
> +    "crypto_sha256_slow,crypto_sha3,crypto_sha512,crypto_sm3,\
> +     crypto_sm4,f_rintd,f_rints,fccmpd,fccmps,fcmpd,fcmps,fdivd,\
> +     fdivs,fsqrtd,fsqrts,neon_fp_cvt_narrow_d_q,\
> +     neon_fp_cvt_narrow_s_q,neon_fp_cvt_widen_h,neon_fp_cvt_widen_s,\
> +     neon_fp_div_d,neon_fp_div_d_q,neon_fp_div_s,neon_fp_div_s_q,\
> +     neon_fp_recpe_d,neon_fp_recpe_d_q,neon_fp_recpe_s,\
> +     neon_fp_recpe_s_q,neon_fp_recps_d,neon_fp_recps_d_q,\
> +     neon_fp_recps_s,neon_fp_recps_s_q,neon_fp_recpx_d,\
> +     neon_fp_recpx_d_q,neon_fp_recpx_s,neon_fp_recpx_s_q,\
> +     neon_fp_round_d,neon_fp_round_d_q,neon_fp_round_s,\
> +     neon_fp_round_s_q,neon_fp_rsqrte_d,neon_fp_rsqrte_d_q,\
> +     neon_fp_rsqrte_s,neon_fp_rsqrte_s_q,neon_fp_rsqrts_d,\
> +     neon_fp_rsqrts_d_q,neon_fp_rsqrts_s,neon_fp_rsqrts_s_q,\
> +     neon_fp_sqrt_d,neon_fp_sqrt_d_q,neon_fp_sqrt_s,\
> +     neon_fp_sqrt_s_q,neon_fp_to_int_d,neon_fp_to_int_d_q,\
> +     neon_fp_to_int_s,neon_fp_to_int_s_q,neon_int_to_fp_d,\
> +     neon_int_to_fp_d_q,neon_int_to_fp_s,neon_int_to_fp_s_q,\
> +     neon_mla_b,neon_mla_b_q,neon_mla_h,neon_mla_h_q,\
> +     neon_mla_s,neon_mla_s_q,neon_mla_b_long,neon_mla_h_long,\
> +     neon_mla_h_scalar,neon_mla_h_scalar_q,neon_mla_s_long,\
> +     neon_mla_s_scalar,neon_mla_s_scalar_q,neon_mla_h_scalar_long,\
> +     neon_mla_s_scalar_long,neon_mul_b,neon_mul_b_q,\
> +     neon_mul_d_long,neon_mul_h,neon_mul_h_q,neon_mul_h_long,\
> +     neon_mul_h_scalar,neon_mul_h_scalar_q,neon_mul_h_scalar_long,\
> +     neon_mul_s,neon_mul_s_q,neon_mul_s_long,neon_mul_s_scalar,\
> +     neon_mul_s_scalar_q,neon_mul_s_scalar_long,neon_sat_mla_b_long,\
> +     neon_sat_mla_h_long,neon_sat_mla_h_scalar_long,\
> +     neon_sat_mla_s_long,neon_sat_mla_s_scalar_long,\
> +     neon_sat_mul_b,neon_sat_mul_b_q,neon_sat_mul_b_long,\
> +     neon_sat_mul_h,neon_sat_mul_h_q,neon_sat_mul_h_long,\
> +     neon_sat_mul_h_scalar,neon_sat_mul_h_scalar_q,\
> +     neon_sat_mul_h_scalar_long,neon_sat_mul_s,neon_sat_mul_s_q,\
> +     neon_sat_mul_s_long,neon_sat_mul_s_scalar,\
> +     neon_sat_mul_s_scalar_q,neon_sat_mul_s_scalar_long")
> +   (eq_attr "sve_type"
> +    "sve_crypto_sha3,sve_fp_cmp,sve_fp_cvt,sve_fp_div,sve_fp_log,\
> +     sve_fp_sqrt,sve_int_cvt,sve_int_div,sve_int_dot,sve_int_index,\
> +     sve_int_mul,sve_int_recip_est"))
> + (const_string "v02")
> + (ior
> +   (eq_attr "type"
> +    "neon_arith_acc,neon_arith_acc_q,neon_reduc_add,\
> +     neon_reduc_add_long,neon_reduc_add_q,neon_reduc_minmax,\
> +     neon_reduc_minmax_q,neon_sat_shift_imm,\
> +     neon_sat_shift_imm_narrow_q,neon_sat_shift_imm_q,\
> +     neon_sat_shift_reg,neon_sat_shift_reg_q,neon_shift_acc,\
> +     neon_shift_acc_q,neon_shift_imm,neon_shift_imm_long,\
> +     neon_shift_imm_narrow_q,neon_shift_imm_q,neon_shift_reg,\
> +     neon_shift_reg_q")
> +   (eq_attr "sve_type"
> +    "sve_fp_assoc_add,sve_fp_exp,sve_int_accum,sve_int_bit_perm,\
> +     sve_int_extend,sve_int_extract,sve_int_shift"))
> + (const_string "v13")
> + (ior
> +   (eq_attr "type" "crypto_pmull,f_cvt,f_cvtf2i,f_minmaxd,f_minmaxs,\
> +    faddd,fadds,fconstd,fconsts,fcsel,ffarithd,ffariths,fmacd,fmacs,\
> +    fmov,fmuld,fmuls,f_mcr,f_mrc,neon_abd,\
> +    neon_abd_long,neon_abd_q,neon_abs,neon_abs_q,neon_add,\
> +    neon_add_halve,neon_add_halve_narrow_q,neon_add_halve_q,\
> +    neon_add_long,neon_add_q,neon_add_widen,neon_bsl,neon_bsl_q,\
> +    neon_cls,neon_cls_q,neon_cnt,neon_cnt_q,neon_compare,\
> +    neon_compare_q,neon_compare_zero,neon_compare_zero_q,\
> +    neon_dup,neon_dup_q,neon_ext,neon_ext_q,neon_fcadd,neon_fcmla,\
> +    neon_fp_abd_d,neon_fp_abd_d_q,neon_fp_abd_s,neon_fp_abd_s_q,\
> +    neon_fp_abs_d,neon_fp_abs_d_q,neon_fp_abs_s,neon_fp_abs_s_q,\
> +    neon_fp_addsub_d,neon_fp_addsub_d_q,neon_fp_addsub_s,\
> +    neon_fp_addsub_s_q,neon_fp_compare_d,neon_fp_compare_d_q,\
> +    neon_fp_compare_s,neon_fp_compare_s_q,neon_fp_minmax_d,\
> +    neon_fp_minmax_d_q,neon_fp_minmax_s,neon_fp_minmax_s_q,\
> +    neon_fp_mla_d,neon_fp_mla_d_q,neon_fp_mla_d_scalar_q,\
> +    neon_fp_mla_s,neon_fp_mla_s_q,neon_fp_mla_s_scalar,\
> +    neon_fp_mla_s_scalar_q,neon_fp_mul_d,neon_fp_mul_d_q,\
> +    neon_fp_mul_d_scalar_q,neon_fp_mul_s,neon_fp_mul_s_q,\
> +    neon_fp_mul_s_scalar,neon_fp_mul_s_scalar_q,neon_fp_neg_d,\
> +    neon_fp_neg_d_q,neon_fp_neg_s,neon_fp_neg_s_q,neon_fp_reduc_add_d,\
> +    neon_fp_reduc_add_d_q,neon_fp_reduc_add_s,neon_fp_reduc_add_s_q,\
> +    neon_fp_reduc_minmax_d,neon_fp_reduc_minmax_d_q,\
> +    neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_s_q,neon_logic,\
> +    neon_logic_q,neon_minmax,neon_minmax_q,neon_move,\
> +    neon_move_narrow_q,neon_move_q,neon_neg,neon_neg_q,neon_permute,\
> +    neon_permute_q,neon_qabs,neon_qabs_q,neon_qadd,neon_qadd_q,\
> +    neon_qneg,neon_qneg_q,neon_qsub,neon_qsub_q,neon_rbit,\
> +    neon_rbit_q,neon_rev,neon_rev_q,neon_sub,neon_sub_halve,\
> +    neon_sub_halve_narrow_q,neon_sub_halve_q,neon_sub_long,\
> +    neon_sub_q,neon_sub_widen,neon_tbl1,neon_tbl1_q,neon_tbl2,\
> +    neon_tbl2_q,neon_tbl3,neon_tbl3_q,neon_tbl4,neon_tbl4_q,\
> +    neon_to_gp,neon_to_gp_q,neon_tst,neon_tst_q,neon_zip,\
> +    neon_zip_q")
> +   (eq_attr "sve_type" "sve_fp_arith,sve_fp_misc,sve_fp_mul,\
> +    sve_fp_reduc,sve_int_general,sve_int_pmul"))
> + (const_string "v")
> + (eq_attr "sve_type" "sve_store_pred")
> + (const_string "l01")
> + (ior
> +   (eq_attr "type" "neon_ldp,neon_ldp_q,neon_load1_1reg,\
> +    neon_load1_1reg_q,neon_load1_2reg,neon_load1_2reg_q,\
> +    neon_load1_3reg,neon_load1_3reg_q,neon_load1_4reg,\
> +    neon_load1_4reg_q")
> +   (eq_attr "sve_type" "sve_load_1reg"))
> + (const_string "l")
> + (eq_attr "type" "f_loadd,f_loads")
> + (const_string "bsm_l")
> + (eq_attr "sve_type" "sve_load_pred")
> + (const_string "m_l")
> + (ior
> +   (eq_attr "type" "neon_ins,neon_ins_q")
> +   (eq_attr "sve_type" "sve_int_cmp_set,sve_int_match,sve_pred_vec"))
> + (const_string "m0_v")
> + (eq_attr "sve_type" "sve_int_reduc")
> + (const_string "v_v13")
> + (ior
> +   (eq_attr "type" "neon_load1_all_lanes,neon_load1_one_lane,\
> +    neon_load1_one_lane_q,neon_load2_2reg,neon_load2_2reg_q,\
> +    neon_load2_all_lanes,neon_load2_all_lanes_q,neon_load2_one_lane,\
> +    neon_load3_3reg,neon_load3_3reg_q,neon_load3_all_lanes,\
> +    neon_load3_all_lanes_q,neon_load3_one_lane,neon_load4_4reg,\
> +    neon_load4_4reg_q,neon_load4_all_lanes,neon_load4_all_lanes_q,\
> +    neon_load4_one_lane")
> +   (eq_attr "sve_type" "sve_gatherload_32,sve_gatherload_64,\
> +    sve_load_2reg,sve_load_3reg,sve_load_4reg"))
> + (const_string "v_l")
> + (eq_attr "type" "load_16,load_4,load_8,store_16,store_4,store_8")
> + (const_string "l01_d")
> + (ior
> +   (eq_attr "type" "f_stored,f_stores,neon_stp,neon_stp_q,\
> +    neon_store1_1reg,neon_store1_1reg_q,neon_store1_2reg,\
> +    neon_store1_2reg_q,neon_store1_3reg,neon_store1_3reg_q,\
> +    neon_store1_4reg,neon_store1_4reg_q,neon_store1_one_lane,\
> +    neon_store1_one_lane_q,neon_store2_2reg,neon_store2_2reg_q,\
> +    neon_store2_one_lane,neon_store2_one_lane_q,neon_store3_3reg,\
> +    neon_store3_3reg_q,neon_store3_one_lane,neon_store3_one_lane_q,\
> +    neon_store4_4reg,neon_store4_4reg_q,neon_store4_one_lane,\
> +    neon_store4_one_lane_q")
> +   (eq_attr "sve_type" "sve_scatterstore_32,sve_scatterstore_64,\
> +    sve_store_1reg,sve_store_2reg,sve_store_3reg,sve_store_4reg"))
> + (const_string "l01_v")]
> + (const_string "none")))
> \ No newline at end of file
> diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h 
> b/gcc/config/aarch64/tuning_models/neoversev2.h
> index faf06d8e7ed..c3749d0c194 100644
> --- a/gcc/config/aarch64/tuning_models/neoversev2.h
> +++ b/gcc/config/aarch64/tuning_models/neoversev2.h
> @@ -21,6 +21,7 @@
> #define GCC_AARCH64_H_NEOVERSEV2
> 
> #include "generic.h"
> +#include "../aarch64-sched-dispatch.h"
> 
> static const struct cpu_regmove_cost neoversev2_regmove_cost =
> {
> @@ -188,6 +189,100 @@ static const struct cpu_vector_cost 
> neoversev2_vector_cost =
>   &neoversev2_vec_issue_info /* issue_info  */
> };
> 
> +/* Neoverse V2 dispatch constraints for instruction scheduling.  */
> +static const dispatch_constraint neoversev2_dispatch_constraints[] = {
> +  dispatch_constraint ("total", 16, [](rtx_insn *)
> +    {
> +      return 1;
> +    }),
> +  dispatch_constraint ("b_s01", 4, [](rtx_insn *insn)
> +    {
> +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_BS01);
> +    }),
> +  dispatch_constraint ("m0", 2, [](rtx_insn *insn)
> +    {
> +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_M0
> +   || dispatch_group == NEOVERSEV2_DISPATCH_M0_V);
> +    }),
> +  dispatch_constraint ("m", 4, [](rtx_insn *insn)
> +    {
> +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_M
> +   || dispatch_group == NEOVERSEV2_DISPATCH_M0
> +   || dispatch_group == NEOVERSEV2_DISPATCH_M_L
> +   || dispatch_group == NEOVERSEV2_DISPATCH_M0_V);
> +    }),
> +  dispatch_constraint ("b_s_m", 8, [](rtx_insn *insn)
> +    {
> +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_BS01
> +   || dispatch_group == NEOVERSEV2_DISPATCH_BSM
> +   || dispatch_group == NEOVERSEV2_DISPATCH_M
> +   || dispatch_group == NEOVERSEV2_DISPATCH_M0
> +   || dispatch_group == NEOVERSEV2_DISPATCH_BSM_L
> +   || dispatch_group == NEOVERSEV2_DISPATCH_M_L
> +   || dispatch_group == NEOVERSEV2_DISPATCH_M0_V);
> +    }),
> +  dispatch_constraint ("v02", 2, [](rtx_insn *insn)
> +    {
> +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_V02);
> +    }),
> +  dispatch_constraint ("v13", 2, [](rtx_insn *insn)
> +    {
> +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> +      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_V13);
> +    }),
> +  dispatch_constraint ("v", 4, [](rtx_insn *insn)
> +    {
> +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> +      switch (dispatch_group) {
> + case NEOVERSEV2_DISPATCH_V02:
> + case NEOVERSEV2_DISPATCH_V13:
> + case NEOVERSEV2_DISPATCH_V:
> + case NEOVERSEV2_DISPATCH_M0_V:
> + case NEOVERSEV2_DISPATCH_V_L:
> + case NEOVERSEV2_DISPATCH_L01_V:
> +  return 1;
> + case NEOVERSEV2_DISPATCH_V_V13:
> +  return 2;
> + default:
> +  return 0;
> +      }
> +    }),
> +  dispatch_constraint ("l01_d", 4, [](rtx_insn *insn)
> +    {
> +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> +      switch (dispatch_group) {
> + case NEOVERSEV2_DISPATCH_L01_V:
> + case NEOVERSEV2_DISPATCH_L01:
> +  return 1;
> + case NEOVERSEV2_DISPATCH_L01_D:
> +  return 2;
> + default:
> +  return 0;
> +      }
> +    }),
> +  dispatch_constraint ("l", 6, [](rtx_insn *insn)
> +    {
> +      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
> +      switch (dispatch_group) {
> + case NEOVERSEV2_DISPATCH_L:
> + case NEOVERSEV2_DISPATCH_BSM_L:
> + case NEOVERSEV2_DISPATCH_M_L:
> + case NEOVERSEV2_DISPATCH_V_L:
> + case NEOVERSEV2_DISPATCH_L01_V:
> +  return 1;
> + case NEOVERSEV2_DISPATCH_L01_D:
> +  return 2;
> + default:
> +  return 0;
> +      }
> +    })
> +};
> +
> static const struct tune_params neoversev2_tunings =
> {
>   &cortexa76_extra_costs,
> @@ -221,12 +316,13 @@ static const struct tune_params neoversev2_tunings =
>    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
>    | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
>    | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW
> -   | AARCH64_EXTRA_TUNE_AVOID_LDAPUR), /* tune_flags.  */
> +   | AARCH64_EXTRA_TUNE_AVOID_LDAPUR
> +   | AARCH64_EXTRA_TUNE_DISPATCH_SCHED), /* tune_flags.  */
>   &generic_armv9a_prefetch_tune,
>   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>   AARCH64_LDP_STP_POLICY_ALWAYS,   /* stp_policy_model.  */
> -  nullptr, /* dispatch_constraints.  */
> -  0 /* num_dispatch_constraints.  */
> +  neoversev2_dispatch_constraints,  /* dispatch_constraints.  */
> +  ARRAY_SIZE (neoversev2_dispatch_constraints)  /* num_dispatch_constraints. 
>  */
> };
> 
> #endif /* GCC_AARCH64_H_NEOVERSEV2.  */
> -- 
> 2.34.1

Re: [PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2.

Reply via email to