Hi ,James:
         Thank you for your review, Please attach the following author 
information to the patch.

2019-04-04  wu yuan <wuyu...@huawei.com>

        * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
        * config/aarch64/aarch64.md : Add "tsv110.md"
        * config/aarch64/tsv110.md: New file.
                                                                                
Thanks,
                                                                wuyuan

-----邮件原件-----
发件人: James Greenhalgh [mailto:james.greenha...@arm.com] 
发送时间: 2019年4月4日 1:58
收件人: wuyuan (E) <wuyu...@huawei.com>
抄送: Kyrill Tkachov <kyrylo.tkac...@foss.arm.com>; gcc-patches@gcc.gnu.org; 
Zhangyichao (AB) <zhangyichao.zh...@huawei.com>; Zhanghaijian (A) 
<z.zhanghaij...@huawei.com>; nd <n...@arm.com>
主题: Re: Re : add tsv110 pipeline scheduling

On Tue, Apr 02, 2019 at 03:26:22PM +0100, wuyuan (E) wrote:
> Hi ,James:
> Has the submitted patch been merged into the trunk?  Looking forward to your 
> reply , thank you very much!                                                  
>                                                                               
>      
>                                                                               
>                 Best Regards,
>                                                                               
>                 wuyuan

Hi Wuyuan,

This patch is OK for trunk. Thank you for your many clarifications.

Will you need one of us to apply this to trunk on your behalf?

If you would like me to apply your patch, please provide the full ChangeLog 
with author information, like so:

2019-04-03  James Greenhalgh  <james.greenha...@arm.com>
            Second Author  <second.aut...@arm.com>
            Third Author  <third.aut...@arm.com>

        * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
        * config/aarch64/aarch64.md : Add "tsv110.md"
        * config/aarch64/tsv110.md: New file.

Thanks,
James


> -----邮件原件-----
> 发件人: wuyuan (E)
> 发送时间: 2019年3月15日 21:57
> 收件人: 'James Greenhalgh' <james.greenha...@arm.com>
> 抄送: Kyrill Tkachov <kyrylo.tkac...@foss.arm.com>; 
> gcc-patches@gcc.gnu.org; Zhangyichao (AB) 
> <zhangyichao.zh...@huawei.com>; Zhanghaijian (A) 
> <z.zhanghaij...@huawei.com>; nd <n...@arm.com>; wufeng (O) 
> <wufe...@huawei.com>; Yangfei (Felix) <felix.y...@huawei.com>
> 主题: Re : add tsv110 pipeline scheduling
> 
> Hi , James:
>      Thank you very much for your meticulous review work. The explanation of 
> the two questions as follows:
>      The first problem is caused by my negligence and should be changed to " 
> crypto_sha256_fast" .
>       The second question I have verified with the hardware engineer. Only 
> ALU2/ALU3 could support PSTATE register update so any instruction intends to 
> update NZCV will be issued to ALU2/ALU3.   MDU could provide a better 
> pipeline efficiency for multi cycle ALU instruction so we issue 2 cycles ALU 
> w/o PSTATE update to MDU unit.  the current pipeline processing is  ok  , 
> except the pipeline " tsv110_alu2" should replace with " tsv110_alu2| 
> tsv110_alu3".
>                                                                               
>                                                                               
>                                            
> 
> The detailed patches are as follows:
> 
>       * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
>       * config/aarch64/aarch64.md : Add "tsv110.md"
>       * config/aarch64/tsv110.md: New file.
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index ed56e5e..82d91d6
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -105,7 +105,7 @@ AARCH64_CORE("neoverse-n1",  neoversen1, 
> cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_  AARCH64_CORE("neoverse-e1",  
> neoversee1, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 
> | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 
> 0x41, 0xd4a, -1)
>  
>  /* HiSilicon ('H') cores. */
> -AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
> AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
> tsv110,   0x48, 0xd01, -1)
> +AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
> AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
> tsv110,   0x48, 0xd01, -1)
>  
>  /* ARMv8.4-A Architecture Processors.  */
>  
> diff --git a/gcc/config/aarch64/aarch64.md 
> b/gcc/config/aarch64/aarch64.md index b7cd9fc..861f059 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -361,6 +361,7 @@
>  (include "thunderx.md")
>  (include "../arm/xgene1.md")
>  (include "thunderx2t99.md")
> +(include "tsv110.md")
>  
>  ;; 
> -------------------------------------------------------------------
>  ;; Jumps and other miscellaneous insns diff --git 
> a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md new file 
> mode 100644 index 0000000..9d12839
> --- /dev/null
> +++ b/gcc/config/aarch64/tsv110.md
> @@ -0,0 +1,708 @@
> +;; tsv110 pipeline description
> +;; Copyright (C) 2018 Free Software Foundation, Inc.
> +;;
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it ;; 
> +under the terms of the GNU General Public License as published by ;; 
> +the Free Software Foundation; either version 3, or (at your option) 
> +;; any later version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but ;; 
> +WITHOUT ANY WARRANTY; without even the implied warranty of ;; 
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU ;; 
> +General Public License for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License 
> +;; along with GCC; see the file COPYING3.  If not see ;; 
> +<http://www.gnu.org/licenses/>.
> +
> +(define_automaton "tsv110")
> +
> +(define_attr "tsv110_neon_type"
> +  "neon_arith_acc, neon_arith_acc_q,
> +   neon_arith_basic, neon_arith_complex,
> +   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
> +   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
> +   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
> +   neon_shift_imm_complex,
> +   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
> +   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
> +   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
> +   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
> +   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
> +   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
> +   neon_bitops, neon_bitops_q, neon_from_gp,
> +   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
> +   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
> +   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
> +   unknown"
> +  (cond [
> +       (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
> +                        neon_reduc_add_acc_q")
> +         (const_string "neon_arith_acc")
> +       (eq_attr "type" "neon_arith_acc_q")
> +         (const_string "neon_arith_acc_q")
> +       (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
> neon_add_long,\
> +                        neon_add_widen, neon_neg, neon_neg_q,\
> +                        neon_reduc_add, neon_reduc_add_q,\
> +                        neon_reduc_add_long, neon_sub, neon_sub_q,\
> +                        neon_sub_long, neon_sub_widen, neon_logic,\
> +                        neon_logic_q, neon_tst, neon_tst_q,\
> +                        neon_compare, neon_compare_q,\
> +                        neon_compare_zero, neon_compare_zero_q,\
> +                        neon_minmax, neon_minmax_q, neon_reduc_minmax,\
> +                        neon_reduc_minmax_q")
> +         (const_string "neon_arith_basic")
> +       (eq_attr "type" "neon_add_halve_narrow_q,\
> +                        neon_add_halve, neon_add_halve_q,\
> +                        neon_sub_halve, neon_sub_halve_q, neon_qabs,\
> +                        neon_qabs_q, neon_qadd, neon_qadd_q, neon_qneg,\
> +                        neon_qneg_q, neon_qsub, neon_qsub_q,\
> +                        neon_sub_halve_narrow_q")
> +         (const_string "neon_arith_complex")
> +
> +       (eq_attr "type" "neon_mul_b, neon_mul_h, neon_mul_s,\
> +                        neon_mul_h_scalar, neon_mul_s_scalar,\
> +                        neon_sat_mul_b, neon_sat_mul_h,\
> +                        neon_sat_mul_s, neon_sat_mul_h_scalar,\
> +                        neon_sat_mul_s_scalar,\
> +                        neon_mul_b_long, neon_mul_h_long,\
> +                        neon_mul_s_long,\
> +                        neon_mul_h_scalar_long, neon_mul_s_scalar_long,\
> +                        neon_sat_mul_b_long, neon_sat_mul_h_long,\
> +                        neon_sat_mul_s_long, neon_sat_mul_h_scalar_long,\
> +                        neon_sat_mul_s_scalar_long,\
> +                        neon_mla_b, neon_mla_h, neon_mla_s,\
> +                        neon_mla_h_scalar, neon_mla_s_scalar,\
> +                        neon_mla_b_long, neon_mla_h_long,\
> +                        neon_mla_s_long,\
> +                        neon_mla_h_scalar_long, neon_mla_s_scalar_long,\
> +                        neon_sat_mla_b_long, neon_sat_mla_h_long,\
> +                        neon_sat_mla_s_long, neon_sat_mla_h_scalar_long,\
> +                        neon_sat_mla_s_scalar_long")
> +         (const_string "neon_multiply")
> +       (eq_attr "type" "neon_mul_b_q, neon_mul_h_q, neon_mul_s_q,\
> +                        neon_mul_h_scalar_q, neon_mul_s_scalar_q,\
> +                        neon_sat_mul_b_q, neon_sat_mul_h_q,\
> +                        neon_sat_mul_s_q, neon_sat_mul_h_scalar_q,\
> +                        neon_sat_mul_s_scalar_q,\
> +                        neon_mla_b_q, neon_mla_h_q, neon_mla_s_q,\
> +                        neon_mla_h_scalar_q, neon_mla_s_scalar_q")
> +         (const_string "neon_multiply_q")
> +
> +       (eq_attr "type" "neon_shift_acc, neon_shift_acc_q")
> +         (const_string "neon_shift_acc")
> +       (eq_attr "type" "neon_shift_imm, neon_shift_imm_q,\
> +                        neon_shift_imm_narrow_q, neon_shift_imm_long")
> +         (const_string "neon_shift_imm_basic")
> +       (eq_attr "type" "neon_sat_shift_imm, neon_sat_shift_imm_q,\
> +                        neon_sat_shift_imm_narrow_q")
> +         (const_string "neon_shift_imm_complex")
> +       (eq_attr "type" "neon_shift_reg")
> +         (const_string "neon_shift_reg_basic")
> +       (eq_attr "type" "neon_shift_reg_q")
> +         (const_string "neon_shift_reg_basic_q")
> +       (eq_attr "type" "neon_sat_shift_reg")
> +         (const_string "neon_shift_reg_complex")
> +       (eq_attr "type" "neon_sat_shift_reg_q")
> +         (const_string "neon_shift_reg_complex_q")
> +
> +       (eq_attr "type" "neon_fp_neg_s, neon_fp_neg_s_q,\
> +                        neon_fp_abs_s, neon_fp_abs_s_q,\
> +                        neon_fp_neg_d, neon_fp_neg_d_q,\
> +                        neon_fp_abs_d, neon_fp_abs_d_q,\
> +                        neon_fp_minmax_s,neon_fp_minmax_d,\
> +                        neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_d")
> +         (const_string "neon_fp_negabs")
> +       (eq_attr "type" "neon_fp_addsub_s, neon_fp_abd_s,\
> +                        neon_fp_reduc_add_s, neon_fp_compare_s,\
> +                        neon_fp_round_s,\
> +                        neon_fp_addsub_d, neon_fp_abd_d,\
> +                        neon_fp_reduc_add_d, neon_fp_compare_d,\
> +                        neon_fp_round_d")
> +         (const_string "neon_fp_arith")
> +       (eq_attr "type" "neon_fp_addsub_s_q, neon_fp_abd_s_q,\
> +                        neon_fp_reduc_add_s_q, neon_fp_compare_s_q,\
> +                        neon_fp_minmax_s_q, neon_fp_round_s_q,\
> +                        neon_fp_addsub_d_q, neon_fp_abd_d_q,\
> +                        neon_fp_reduc_add_d_q, neon_fp_compare_d_q,\
> +                        neon_fp_minmax_d_q, neon_fp_round_d_q")
> +         (const_string "neon_fp_arith_q")
> +       (eq_attr "type" "neon_fp_reduc_minmax_s_q,\
> +                        neon_fp_reduc_minmax_d_q,\
> +                        neon_fp_reduc_add_s_q, neon_fp_reduc_add_d_q")
> +         (const_string "neon_fp_reductions_q")
> +       (eq_attr "type" "neon_fp_to_int_s, neon_int_to_fp_s,\
> +                        neon_fp_to_int_d, neon_int_to_fp_d")
> +         (const_string "neon_fp_cvt_int")
> +       (eq_attr "type" "neon_fp_to_int_s_q, neon_int_to_fp_s_q,\
> +                        neon_fp_to_int_d_q, neon_int_to_fp_d_q")
> +         (const_string "neon_fp_cvt_int_q")
> +       (eq_attr "type" "neon_fp_cvt_narrow_s_q, neon_fp_cvt_widen_h")
> +         (const_string "neon_fp_cvt16")
> +       (eq_attr "type" "neon_fp_mul_s, neon_fp_mul_s_scalar,\
> +                        neon_fp_mul_d")
> +         (const_string "neon_fp_mul")
> +       (eq_attr "type" "neon_fp_mul_s_q, neon_fp_mul_s_scalar_q,\
> +                        neon_fp_mul_d_q, neon_fp_mul_d_scalar_q")
> +         (const_string "neon_fp_mul_q")
> +       (eq_attr "type" "neon_fp_mla_s, neon_fp_mla_s_scalar,\
> +                        neon_fp_mla_d")
> +         (const_string "neon_fp_mla")
> +       (eq_attr "type" "neon_fp_mla_s_q, neon_fp_mla_s_scalar_q,
> +                        neon_fp_mla_d_q, neon_fp_mla_d_scalar_q")
> +         (const_string "neon_fp_mla_q")
> +       (eq_attr "type" "neon_fp_recpe_s, neon_fp_rsqrte_s,\
> +                        neon_fp_recpx_s,\
> +                        neon_fp_recpe_d, neon_fp_rsqrte_d,\
> +                        neon_fp_recpx_d")
> +         (const_string "neon_fp_recpe_rsqrte")
> +       (eq_attr "type" "neon_fp_recpe_s_q, neon_fp_rsqrte_s_q,\
> +                        neon_fp_recpx_s_q,\
> +                        neon_fp_recpe_d_q, neon_fp_rsqrte_d_q,\
> +                        neon_fp_recpx_d_q")
> +         (const_string "neon_fp_recpe_rsqrte_q")
> +       (eq_attr "type" "neon_fp_recps_s, neon_fp_rsqrts_s,\
> +                        neon_fp_recps_d, neon_fp_rsqrts_d")
> +         (const_string "neon_fp_recps_rsqrts")
> +       (eq_attr "type" "neon_fp_recps_s_q, neon_fp_rsqrts_s_q,\
> +                        neon_fp_recps_d_q, neon_fp_rsqrts_d_q")
> +         (const_string "neon_fp_recps_rsqrts_q")
> +       (eq_attr "type" "neon_bsl, neon_cls, neon_cnt,\
> +                        neon_rev, neon_permute, neon_rbit,\
> +                        neon_tbl1, neon_tbl2, neon_zip,\
> +                        neon_dup, neon_dup_q, neon_ext, neon_ext_q,\
> +                        neon_move, neon_move_q, neon_move_narrow_q")
> +         (const_string "neon_bitops")
> +       (eq_attr "type" "neon_bsl_q, neon_cls_q, neon_cnt_q,\
> +                        neon_rev_q, neon_permute_q, neon_rbit_q")
> +         (const_string "neon_bitops_q")
> +       (eq_attr "type" "neon_from_gp,f_mcr,f_mcrr")
> +         (const_string "neon_from_gp")
> +       (eq_attr "type" "neon_from_gp_q")
> +         (const_string "neon_from_gp_q")
> +
> +       (eq_attr "type" "f_loads, f_loadd,\
> +                        neon_load1_1reg, neon_load1_1reg_q,\
> +                        neon_load1_2reg, neon_load1_2reg_q")
> +         (const_string "neon_load_a")
> +       (eq_attr "type" "neon_load1_3reg, neon_load1_3reg_q,\
> +                        neon_load1_4reg, neon_load1_4reg_q")
> +         (const_string "neon_load_b")
> +       (eq_attr "type" "neon_load1_one_lane, neon_load1_one_lane_q,\
> +                        neon_load1_all_lanes, neon_load1_all_lanes_q,\
> +                        neon_load2_2reg, neon_load2_2reg_q,\
> +                        neon_load2_all_lanes, neon_load2_all_lanes_q")
> +         (const_string "neon_load_c")
> +       (eq_attr "type" "neon_load2_4reg, neon_load2_4reg_q,\
> +                        neon_load3_3reg, neon_load3_3reg_q,\
> +                        neon_load3_one_lane, neon_load3_one_lane_q,\
> +                        neon_load4_4reg, neon_load4_4reg_q")
> +         (const_string "neon_load_d")
> +       (eq_attr "type" "neon_load2_one_lane, neon_load2_one_lane_q,\
> +                        neon_load3_all_lanes, neon_load3_all_lanes_q,\
> +                        neon_load4_all_lanes, neon_load4_all_lanes_q")
> +         (const_string "neon_load_e")
> +       (eq_attr "type" "neon_load4_one_lane, neon_load4_one_lane_q")
> +         (const_string "neon_load_f")
> +
> +       (eq_attr "type" "f_stores, f_stored,\
> +                        neon_store1_1reg")
> +         (const_string "neon_store_a")
> +       (eq_attr "type" "neon_store1_2reg, neon_store1_1reg_q")
> +         (const_string "neon_store_b")
> +       (eq_attr "type" "neon_store1_3reg, neon_store1_3reg_q,\
> +                        neon_store3_3reg, neon_store3_3reg_q,\
> +                        neon_store2_4reg, neon_store2_4reg_q,\
> +                        neon_store4_4reg, neon_store4_4reg_q,\
> +                        neon_store2_2reg, neon_store2_2reg_q,\
> +                        neon_store3_one_lane, neon_store3_one_lane_q,\
> +                        neon_store4_one_lane, neon_store4_one_lane_q,\
> +                        neon_store1_4reg, neon_store1_4reg_q,\
> +                        neon_store1_one_lane, neon_store1_one_lane_q,\
> +                        neon_store2_one_lane, neon_store2_one_lane_q")
> +         (const_string "neon_store_complex")]
> +       (const_string "unknown")))
> +
> +;; The tsv110 core is modelled as issues pipeline that has ;; the 
> +following functional units.
> +;; 1.  Three pipelines for integer operations: ALU1, ALU2, ALU3
> +
> +(define_cpu_unit "tsv110_alu1_issue" "tsv110") (define_reservation 
> +"tsv110_alu1" "tsv110_alu1_issue")
> +
> +(define_cpu_unit "tsv110_alu2_issue" "tsv110") (define_reservation 
> +"tsv110_alu2" "tsv110_alu2_issue")
> +
> +(define_cpu_unit "tsv110_alu3_issue" "tsv110") (define_reservation 
> +"tsv110_alu3" "tsv110_alu3_issue")
> +
> +;; 2.  One pipeline for complex integer operations: MDU
> +
> +(define_cpu_unit "tsv110_mdu_issue" "tsv110") (define_reservation 
> +"tsv110_mdu" "tsv110_mdu_issue")
> +
> +;; 3.  Two asymmetric pipelines for Asimd and FP operations: FSU1, 
> +FSU2 (define_automaton "tsv110_fsu")
> +
> +(define_cpu_unit "tsv110_fsu1_issue"
> +              "tsv110_fsu")
> +(define_cpu_unit "tsv110_fsu2_issue"
> +              "tsv110_fsu")
> +
> +(define_reservation "tsv110_fsu1" "tsv110_fsu1_issue") 
> +(define_reservation "tsv110_fsu2" "tsv110_fsu2_issue")
> +
> +;; 4.  Two pipeline for branch operations but same with alu2 and alu3: 
> +BRU1, BRU2
> +
> +;; 5.  Two pipelines for load and store operations: LS1, LS2.
> +
> +(define_cpu_unit "tsv110_ls1_issue" "tsv110") (define_cpu_unit 
> +"tsv110_ls2_issue" "tsv110") (define_reservation "tsv110_ls1"
> +"tsv110_ls1_issue") (define_reservation "tsv110_ls2" 
> +"tsv110_ls2_issue")
> +
> +;; Block all issue queues.
> +
> +(define_reservation "tsv110_block" "tsv110_fsu1_issue + tsv110_fsu2_issue
> +                               + tsv110_mdu_issue + tsv110_alu1_issue
> +                               + tsv110_alu2_issue + tsv110_alu3_issue + 
> tsv110_ls1_issue +
> +tsv110_ls2_issue")
> +
> +;; Simple Execution Unit:
> +;;
> +;; Simple ALU without shift
> +(define_insn_reservation "tsv110_alu" 1
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "alu_imm,logic_imm,\
> +                     alu_sreg,logic_reg,\
> +                     adc_imm,adc_reg,\
> +                     adr,bfm,clz,rbit,rev,\
> +                     shift_imm,shift_reg,\
> +                     mov_imm,mov_reg,\
> +                     mvn_imm,mvn_reg,\
> +                     mrs,multiple,no_insn"))
> +  "tsv110_alu1|tsv110_alu2|tsv110_alu3")
> +  
> +(define_insn_reservation "tsv110_alus" 1
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "alus_imm,logics_imm,\
> +                     alus_sreg,logics_reg,\
> +                     adcs_imm,adcs_reg"))
> +  "tsv110_alu2|tsv110_alu3")
> +
> +;; ALU ops with shift
> +(define_insn_reservation "tsv110_alu_shift" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "extend,\
> +                     alu_shift_imm,alu_shift_reg,\
> +                     crc,logic_shift_imm,logic_shift_reg,\
> +                     mov_shift,mvn_shift,\
> +                     mov_shift_reg,mvn_shift_reg"))
> +  "tsv110_mdu")
> +  
> +(define_insn_reservation "tsv110_alus_shift" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "alus_shift_imm,alus_shift_reg,\
> +                     logics_shift_imm,logics_shift_reg"))
> +  "tsv110_alu2|tsv110_alu3")
> +
> +;; Multiplies instructions
> +(define_insn_reservation "tsv110_mult" 3
> +  (and (eq_attr "tune" "tsv110")
> +       (ior (eq_attr "mul32" "yes")
> +         (eq_attr "widen_mul64" "yes")))
> +  "tsv110_mdu")
> +
> +;; Integer divide
> +(define_insn_reservation "tsv110_div" 10
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "udiv,sdiv"))
> +  "tsv110_mdu")
> +
> +;; Block all issue pipes for a cycle
> +(define_insn_reservation "tsv110_block" 1
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "block"))
> +  "tsv110_block")
> +
> +;; Branch execution Unit
> +;;
> +;; Branches take two issue slot.
> +;; No latency as there is no result
> +(define_insn_reservation "tsv110_branch" 0
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "branch"))
> +  "tsv110_alu2|tsv110_alu3")
> +
> +;; Load-store execution Unit
> +;;
> +;; Loads of up to two words.
> +(define_insn_reservation "tsv110_load1" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "load_4,load_8"))
> +  "tsv110_ls1|tsv110_ls2")
> +
> +;; Stores of up to two words.
> +(define_insn_reservation "tsv110_store1" 0
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "store_4,store_8"))
> +  "tsv110_ls1|tsv110_ls2")
> +
> +;; Advanced SIMD Unit - Integer Arithmetic Instructions.
> +
> +(define_insn_reservation  "tsv110_neon_abd_aba" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_abd,neon_arith_acc"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation  "tsv110_neon_abd_aba_q" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_arith_acc_q"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation  "tsv110_neon_arith_basic" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_arith_basic"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation  "tsv110_neon_arith_complex" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_arith_complex"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +;; Integer Multiply Instructions.
> +;; D-form
> +(define_insn_reservation "tsv110_neon_multiply" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_multiply"))
> +  "tsv110_fsu1")
> +
> +(define_insn_reservation "tsv110_neon_multiply_dlong" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_mul_d_long"))
> +  "tsv110_fsu1")
> +
> +;; Q-form
> +(define_insn_reservation "tsv110_neon_multiply_q" 8
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_multiply_q"))
> +  "tsv110_fsu1")
> +
> +;; Integer Shift Instructions.
> +
> +(define_insn_reservation
> +  "tsv110_neon_shift_acc" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_shift_acc,\
> +        neon_shift_imm_basic,neon_shift_imm_complex,neon_shift_reg_basic,\
> +        neon_shift_reg_complex"))
> +  "tsv110_fsu1")
> +
> +(define_insn_reservation
> +  "tsv110_neon_shift_acc_q" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_shift_reg_basic_q,\
> +        neon_shift_reg_complex_q"))
> +  "tsv110_fsu1")
> +
> +;; Floating Point Instructions.
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_negabs" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_negabs"))
> +  "(tsv110_fsu1|tsv110_fsu2)")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_arith" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_arith"))
> +  "(tsv110_fsu1|tsv110_fsu2)")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_arith_q" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_arith_q"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +  
> +(define_insn_reservation
> +  "tsv110_neon_fp_minmax_q" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_fp_minmax_s_q,neon_fp_minmax_d_q"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_reductions_q" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_reductions_q"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_cvt_int" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" 
> +"neon_fp_cvt_int,neon_fp_cvt_int_q"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_mul" 5
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_mul"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_mul_q" 5
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_mul_q"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_mla" 7
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_mla,\
> +        neon_fp_recps_rsqrts"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_recpe_rsqrte" 3
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_recpe_rsqrte"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_mla_q" 7
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_mla_q,\
> +        neon_fp_recps_rsqrts_q"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_fp_recpe_rsqrte_q" 3
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_fp_recpe_rsqrte_q"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +;; Miscellaneous Instructions.
> +
> +(define_insn_reservation
> +  "tsv110_neon_bitops" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_bitops"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_dup" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_from_gp,f_mcr"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_mov" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "f_mcrr"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_bitops_q" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_bitops_q"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_from_gp_q" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_from_gp_q"))
> +  "(tsv110_alu1+tsv110_fsu1)|(tsv110_alu1+tsv110_fsu2)")
> +
> +(define_insn_reservation
> +  "tsv110_neon_to_gp" 3
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_to_gp,neon_to_gp_q"))
> +  "tsv110_fsu1")
> +
> +;; Load Instructions.
> +
> +(define_insn_reservation
> +  "tsv110_neon_ld1_lane" 8
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_load1_one_lane,neon_load1_one_lane_q,\
> +        neon_load1_all_lanes,neon_load1_all_lanes_q"))
> +  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 
> ++
> +tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
> +
> +(define_insn_reservation
> +  "tsv110_neon_ld1_reg1" 6
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" 
> +"f_loads,f_loadd,neon_load1_1reg,neon_load1_1reg_q"))
> +  "tsv110_ls1|tsv110_ls2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_ld1_reg2" 6
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_load1_2reg,neon_load1_2reg_q"))
> +  "tsv110_ls1|tsv110_ls2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_ld1_reg3" 7
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_load1_3reg,neon_load1_3reg_q"))
> +  "tsv110_ls1|tsv110_ls2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_ld1_reg4" 7
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_load1_4reg,neon_load1_4reg_q"))
> +  "tsv110_ls1|tsv110_ls2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_ld2" 8
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_load1_2reg,neon_load1_2reg_q,\
> +        neon_load2_2reg,neon_load2_2reg_q,neon_load2_all_lanes,\
> +        
> +neon_load2_all_lanes_q,neon_load2_one_lane,neon_load2_one_lane_q"))
> +  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 
> ++
> +tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
> +
> +(define_insn_reservation
> +  "tsv110_neon_ld3" 9
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_load3_3reg,neon_load3_3reg_q,\
> +        neon_load3_one_lane,neon_load3_one_lane_q,\
> +        neon_load3_all_lanes,neon_load3_all_lanes_q"))
> +  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 
> ++
> +tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
> +
> +(define_insn_reservation
> +  "tsv110_neon_ld4_lane" 9
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
> +        neon_load4_one_lane,neon_load4_one_lane_q"))
> +  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 
> ++
> +tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
> +
> +(define_insn_reservation
> +  "tsv110_neon_ld4_reg" 11
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
> +        neon_load4_one_lane,neon_load4_one_lane_q"))
> +  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 
> ++
> +tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
> +
> +;; Store Instructions.
> +
> +(define_insn_reservation
> +  "tsv110_neon_store_a" 0
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_store_a"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation
> +  "tsv110_neon_store_b" 0
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_store_b"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +;; These block issue for a number of cycles proportional to the 
> +number ;; of 64-bit chunks they will store, we don't attempt to model 
> +that ;; precisely, treat them as blocking execution for two cycles 
> +when ;; issued.
> +(define_insn_reservation
> +  "tsv110_neon_store_complex" 0
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "tsv110_neon_type" "neon_store_complex"))
> +  "tsv110_block*2")
> +
> +;; Floating-Point Operations.
> +
> +(define_insn_reservation "tsv110_fp_const" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "fconsts,fconstd,fmov"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation "tsv110_fp_add_sub" 5
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "fadds,faddd,fmuls,fmuld"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation "tsv110_fp_mac" 7
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "fmacs,ffmas,fmacd,ffmad"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation "tsv110_fp_cvt" 3
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "f_cvt"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation "tsv110_fp_cvtf2i" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "f_cvtf2i"))
> +  "tsv110_fsu1")
> +
> +(define_insn_reservation "tsv110_fp_cvti2f" 5
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "f_cvti2f"))
> +  "(tsv110_alu1+tsv110_fsu1)|(tsv110_alu1+tsv110_fsu2)")
> +
> +(define_insn_reservation "tsv110_fp_cmp" 4
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "fcmps,fcmpd"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation "tsv110_fp_arith" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "ffariths,ffarithd"))
> +  "tsv110_fsu1|tsv110_fsu2")
> +
> +(define_insn_reservation "tsv110_fp_divs" 12
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "fdivs,neon_fp_div_s,fdivd,neon_fp_div_d,\
> +        neon_fp_div_s_q,neon_fp_div_d_q"))
> +  "tsv110_fsu1")
> +
> +(define_insn_reservation "tsv110_fp_sqrts" 24
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "fsqrts,neon_fp_sqrt_s,fsqrtd,neon_fp_sqrt_d,\
> +        neon_fp_sqrt_s_q,neon_fp_sqrt_d_q"))
> +  "tsv110_fsu2")
> +
> +(define_insn_reservation "tsv110_crypto_aes" 3
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "crypto_aese,crypto_aesmc"))
> +  "tsv110_fsu1")
> +  
> +(define_insn_reservation "tsv110_crypto_sha1_fast" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "crypto_sha1_fast,crypto_sha1_xor"))
> +  "(tsv110_fsu1|tsv110_fsu2)")
> +
> +(define_insn_reservation "tsv110_crypto_sha256_fast" 2
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "crypto_sha256_fast"))
> +  "tsv110_fsu1")
> +
> +(define_insn_reservation "tsv110_crypto_complex" 5
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "crypto_sha1_slow,crypto_sha256_slow"))
> +  "tsv110_fsu1")
> +
> +;; We lie with calls.  They take up all issue slots, but are 
> +otherwise ;; not harmful.
> +(define_insn_reservation "tsv110_call" 1
> +  (and (eq_attr "tune" "tsv110")
> +       (eq_attr "type" "call"))
> +  
> "tsv110_alu1_issue+tsv110_alu2_issue+tsv110_alu3_issue+tsv110_fsu1_issue+tsv110_fsu2_issue\
> +    +tsv110_mdu_issue+tsv110_ls1_issue+tsv110_ls2_issue"
> +)
> +
> +;; Simple execution unit bypasses
> +(define_bypass 1 "tsv110_alu"
> +              "tsv110_alu,tsv110_alu_shift") (define_bypass 2 
> +"tsv110_alu_shift"
> +              "tsv110_alu,tsv110_alu_shift")
> +
> +;; An MLA or a MUL can feed a dependent MLA.
> +(define_bypass 3 "tsv110_neon_*mla*,tsv110_neon_*mul*"
> +              "tsv110_neon_*mla*")
> +
> +;; We don't need to care about control hazards, either the branch is 
> +;; predicted in which case we pay no penalty, or the branch is ;; 
> +mispredicted in which case instruction scheduling will be unlikely to 
> +;; help.
> +(define_bypass 1 "tsv110_*"
> +              "tsv110_call,tsv110_branch")
> 
> 
> 
>  
> 
> 
> 
> -----邮件原件-----
> 发件人: James Greenhalgh [mailto:james.greenha...@arm.com]
> 发送时间: 2019年3月14日 20:36
> 收件人: wuyuan (E) <wuyu...@huawei.com>
> 抄送: Kyrill Tkachov <kyrylo.tkac...@foss.arm.com>; 
> gcc-patches@gcc.gnu.org; Zhangyichao (AB) 
> <zhangyichao.zh...@huawei.com>; Zhanghaijian (A) 
> <z.zhanghaij...@huawei.com>; nd <n...@arm.com>; wufeng (O) 
> <wufe...@huawei.com>; Yangfei (Felix) <felix.y...@huawei.com>
> 主题: Re: Re : add tsv110 pipeline scheduling
> 
> On Sat, Feb 23, 2019 at 01:28:22PM +0000, wuyuan (E) wrote:
> > Hi ,James:
> > Sorry for not responding to your email in time because of Chinese New 
> > Year’s holiday and urgent work. The three questions you mentioned last 
> > email are due to my misunderstanding of pipeline.
> > the first question, These instructions will occupy both the tsv110_ls* and 
> > tsv110_fsu* Pipeline at the same time.
> 
> Hi Wuyuan,
> 
> Please accept my apologies for how long it has taken me to revisit your patch 
> and review it.
> 
> I have two questions:
> 
> > +(define_insn_reservation "tsv110_crypto_sha256_fast" 2
> > +  (and (eq_attr "tune" "tsv110")
> > +       (eq_attr "type" "crypto_sha1_fast"))
> > +  "tsv110_fsu1")
> 
> I think you intended to check for type crypto_sha256_fast here.
> 
> > +;; ALU ops with shift
> > +(define_insn_reservation "tsv110_alu_shift" 2
> > +  (and (eq_attr "tune" "tsv110")
> > +       (eq_attr "type" "extend,\
> > +                   alu_shift_imm,alu_shift_reg,\
> > +                   crc,logic_shift_imm,logic_shift_reg,\
> > +                   mov_shift,mvn_shift,\
> > +                   mov_shift_reg,mvn_shift_reg"))
> > +  "tsv110_mdu")
> > +  
> > +(define_insn_reservation "tsv110_alus_shift" 2
> > +  (and (eq_attr "tune" "tsv110")
> > +       (eq_attr "type" "alus_shift_imm,alus_shift_reg,\
> > +                   logics_shift_imm,logics_shift_reg"))
> > +  "tsv110_alu2")
> 
> Is this the correct description? This code says that ALU operations with 
> shift are executed in MDU, but ALU operations with shift that are also flag 
> setting are executed in ALU2?
> 
> Otherwise, this patch is OK for trunk. Thank you for your patience.
> 
> Best Regards,
> James
> 
> > rewritten as follows:
> > (define_insn_reservation
> >   "tsv110_neon_ld4_lane" 9
> >   (and (eq_attr "tune" "tsv110")
> >        (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
> >        neon_load4_one_lane,neon_load4_one_lane_q"))
> >   "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 
> > +
> > tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
> > 
> > the second question, These instructions will use tsv110_fsu1 Pipeline or 
> > tsv110_fsu2 Pipeline.
> > rewritten as follows:
> > (define_insn_reservation  "tsv110_neon_abd_aba" 4
> >   (and (eq_attr "tune" "tsv110")
> >        (eq_attr "type" "neon_abd,neon_arith_acc"))
> >   "tsv110_fsu1|tsv110_fsu2")
> > 
> > the third question, These instructions will use tsv110_fsu1 Pipeline or 
> > tsv110_fsu2 Pipeline.
> > rewritten as follows:
> > (define_insn_reservation  "tsv110_neon_abd_aba_q" 4
> >   (and (eq_attr "tune" "tsv110")
> >        (eq_attr "type" "neon_arith_acc_q"))
> >   "tsv110_fsu1|tsv110_fsu2")
> > 
> > In addition to the above changes, I asked hardware engineers and colleagues 
> > to review my  patch and modify some of the errors. The detailed patches are 
> > as follows:
> > 
> >       * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
> >       * config/aarch64/aarch64.md : Add "tsv110.md"
> >       * config/aarch64/tsv110.md: New file.
> > 

Reply via email to