https://gcc.gnu.org/g:dd682ea04149261d27c871879c0c81a94ece8cd8
commit r17-895-gdd682ea04149261d27c871879c0c81a94ece8cd8 Author: Kewen Lin <[email protected]> Date: Thu May 28 11:22:57 2026 +0000 i386: Refine c86-4g fdiv scheduling model Commit r17-258 introduced separated c86-4g fdiv units to avoid the automaton explosion caused by modeling the whole divider latency on normal FPU pipes. But the real hardware may keep the associated FPU pipe occupied for some cycles at both the beginning and the end of an fdiv or sqrt operation. Following Alexander's suggestion in [1], this patch still keeps the long-latency part on the dedicated fdiv unit but models only a bounded part of the FPU pipe occupancy. It makes the first four cycles reserve both the selected FPU pipe and the fdiv unit, then keep only the fdiv unit for the remaining cycles. Taking r17-258 as baseline, I tried K = 1,2,3,4 for fpu,divider*N -> (fpu+divider)*K, divider*(N-K) and measured the time for build/genautomata and the top 100 symbol sizes of insn-automata.o (baseline normalized as 100) as below: 1) without any other changes: time size baseline 100 100 r17-203 340.0 629.3 K1 100.3 100 K2 105.5 112.5 K3 112.8 129 K4 119.4 141 2) Splitting fpu0/fpu2 and fpu1/fpu3 to paired automatons: time size baseline 100 100 r17-203 340.0 629.3 KS1 79.6 43.3 KS2 79.8 43.3 KS3 79.6 43.3 KS4 79.4 43.3 It turns out that if we want to model the FPU occupancy for some beginning cycles, separating the involved fpu1/fpu3 from the original fpu looks better. So this patch splits fpu0/fpu2 and fpu1/fpu3 into two paired automata and this extra coupling does not grow the main FPU automata significantly. This patch also corrects some other modeling omissions like: - Fix c86_4g_fp_op_idiv_load latency typo by one cycle. - Merge the old c86_4g_m7 idiv DI/SI/HI reservations after aligning their latency and divider unit occupancy (with updated values), while keeping QI separate. - Adjust reservation units in templates like c86_4g_m7_avx_vpinsr_reg_load and c86_4g_m7_avx512_sseadd_xy etc. - Add missing reservation units and unit occupancy in templates like c86_4g_m7_avx512_permi2_ymm and c86_4g_m7_sse_sseiadd_hplus_load etc. - Adjust reservation units and unit occupancy in templates like c86_4g_m7_avx512_perm_zmm_imm, c86_4g_m7_avx512_expand and c86_4g_m7_avx512_ssemul etc. And also introduces some reusable reservation aliases to simplify some modelings. I tested build time for i686 bootstrapping in a docker container: - r17-202: 2437s (before c86-4g support) - r17-203: 7291s (c86-4g support) - r17-258: 2646s (tweaking for build time) - this: 2358s It looks this patch improves build time (even better than r17-202 though the trivial gap can be due to some jitter). The symbol sizes are improved as below: nm -CS -t d --defined-only gcc/insn-automata.o \ | sed 's/^[0-9]* 0*//' \ | sort -n | tail -20 with r17-258: 20068 r bdver1_fp_transitions 22354 r c86_4g_m7_ieu_min_issue_delay 26208 r slm_min_issue_delay 26580 t internal_min_issue_delay(int, DFA_chip*) 26869 t internal_state_transition(int, DFA_chip*) 27244 r bdver1_fp_min_issue_delay 28518 r glm_check 28518 r glm_transitions 33690 r geode_min_issue_delay 33728 r c86_4g_fp_transitions 45436 r znver4_fpu_min_issue_delay 46980 r bdver3_fp_min_issue_delay 49428 r glm_min_issue_delay 53730 r btver2_fp_min_issue_delay 53760 r znver1_fp_transitions 89414 r c86_4g_m7_ieu_transitions 93960 r bdver3_fp_transitions 181744 r znver4_fpu_transitions 326322 r c86_4g_m7_fpu_min_issue_delay 1305288 r c86_4g_m7_fpu_transitions with this: 17872 r print_reservation(_IO_FILE*, rtx_insn*)::... 20068 r bdver1_fp_check 20068 r bdver1_fp_transitions 22016 r c86_4g_m7_fpu02_transitions 22354 r c86_4g_m7_ieu_min_issue_delay 26208 r slm_min_issue_delay 27244 r bdver1_fp_min_issue_delay 28199 t internal_min_issue_delay(int, DFA_chip*) 28362 t internal_state_transition(int, DFA_chip*) 28518 r glm_check 28518 r glm_transitions 33690 r geode_min_issue_delay 45436 r znver4_fpu_min_issue_delay 46980 r bdver3_fp_min_issue_delay 49428 r glm_min_issue_delay 53730 r btver2_fp_min_issue_delay 53760 r znver1_fp_transitions 89414 r c86_4g_m7_ieu_transitions 93960 r bdver3_fp_transitions 181744 r znver4_fpu_transitions Based on random sampling of SPEC2017 benchmarks 525.x264_r and 521.wrf_r, I verified that the new modeling introduces no significant compilation overhead. Testing with a single job on a c86-4g-m7 machine revealed no impact on x264 and a tiny increase for wrf (~0.3%). [1] https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716681.html gcc/ChangeLog: * config/i386/c86-4g-m7.md (c86_4g_m7_fpu): Remove automaton. (c86_4g_m7_fpu02): New automaton. (c86_4g_m7_fpu13): Ditto. (c86-4g-m7-fpu0): Move to c86_4g_m7_fpu02 automaton. (c86-4g-m7-fpu1): Move to c86_4g_m7_fpu13 automaton. (c86-4g-m7-fpu2): Move to c86_4g_m7_fpu02 automaton. (c86-4g-m7-fpu3): Move to c86_4g_m7_fpu13 automaton. (c86-4g-m7-fdiv): Remove cpu unit. (c86-4g-m7-fdiv1): New cpu unit. (c86-4g-m7-fdiv3): Ditto. (c86-4g-m7-fpu_0_3): New reservation. (c86-4g-m7-fpu_1_3x2): Ditto. (c86-4g-m7-fpu_1_3x3): Ditto. (c86-4g-m7-fpu_1_3x6): Ditto. (c86-4g-m7-fpux2): Ditto. (c86-4g-m7-fpux4): Ditto. (c86-4g-m7-fpux6): Ditto. (c86-4g-m7-fpux8): Ditto. (c86-4g-m7-fpux16): Ditto. (c86-4g-m7-fp1fdiv1x4): Ditto. (c86-4g-m7-fp3fdiv3x4): Ditto. (c86-4g-m7-fdiv13): Ditto. (c86-4g-m7-fp13div13): Ditto. (c86-4g-m7-fp13div13x4): Ditto. (c86-4g-m7-fp1div1_fp3div3_x4x8): Ditto. (c86-4g-m7-fp1div1_fp3div3_x4x9): Ditto. (c86-4g-m7-fp1div1_fp3div3_x4x11): Ditto. (c86-4g-m7-fp1div1_fp3div3_x4x15): Ditto. (c86-4g-m7-fp1div1_fp3div3_x4x18): Ditto. (c86_4g_m7_idiv): New reservation. (c86_4g_m7_idiv_QI): Adjust reservation latency and unit occupancy. (c86_4g_m7_idiv_load): New reservation. (c86_4g_m7_idiv_QI_load): Adjust reservation latency and unit occupancy. (c86_4g_m7_idiv_DI): Remove reservation. (c86_4g_m7_idiv_SI): Ditto. (c86_4g_m7_idiv_HI): Ditto. (c86_4g_m7_idiv_DI_load): Ditto. (c86_4g_m7_idiv_SI_load): Ditto. (c86_4g_m7_idiv_HI_load): Ditto. (c86_4g_m7_sse_insertimm): Adjust reservation units and unit occupancy. (c86_4g_m7_sse_insert): Ditto. (c86_4g_m7_fp_sqrt): Adjust reservation. (c86_4g_m7_fp_div): Ditto. (c86_4g_m7_fp_div_load): Ditto. (c86_4g_m7_fp_idiv_load): Ditto. (c86_4g_m7_sse_pinsr_reg): Adjust reservation units and unit occupancy. (c86_4g_m7_sse_pinsr_reg_load): Ditto. (c86_4g_m7_avx_vpinsr_reg): Ditto. (c86_4g_m7_avx_vpinsr_reg_load): Ditto. (c86_4g_m7_avx512_perm_xmm): Delete the prefix condition. (c86_4g_m7_avx512_perm_xmm_opload): Ditto. (c86_4g_m7_avx512_permi2_ymm): Adjust reservation units and unit occupancy. (c86_4g_m7_avx512_permi2_zmm): Ditto. (c86_4g_m7_avx512_permi2_ymm_load): Ditto. (c86_4g_m7_avx512_permi2_zmm_load): Ditto. (c86_4g_m7_avx512_perm_zmm_imm): Ditto. (c86_4g_m7_avx512_perm_zmm_imm_load): Ditto. (c86_4g_m7_avx512_perm_zmm_noimm): Ditto. (c86_4g_m7_sse_perm_zmm_noimm_load): Ditto. (c86_4g_m7_avx_perm_ymm): Remove. (c86_4g_m7_avx_perm_ymem): Ditto. (c86_4g_m7_avx512_shuf_zmm): Adjust reservation units and unit occupancy. (c86_4g_m7_avx512_shuf_zmem): Ditto. (c86_4g_m7_avx512_cmpestr): Ditto. (c86_4g_m7_avx512_cmpestr_load): Ditto. (c86_4g_m7_avx512_vdbpsadbw_zmm): Ditto. (c86_4g_m7_avx512_vdbpsadbw_zmem): Ditto. (c86_4g_m7_avx_ssecomi_comi): Ditto. (c86_4g_m7_avx_ssecomi_comi_load): Ditto. (c86_4g_m7_avx512_expand): Ditto. (c86_4g_m7_avx512_expand_load): Ditto. (c86_4g_m7_avx512_expand_z): Ditto. (c86_4g_m7_avx512_expand_z_load): Ditto. (c86_4g_m7_sse_movnt_xy): Rename to c86_4g_m7_sse_movnt. (c86_4g_m7_avx512_sseadd_xy): Adjust reservation units. (c86_4g_m7_avx512_sseadd_xy_load): Ditto. (c86_4g_m7_sse_sseiadd_hplus): Adjust reservation units and unit occupancy. (c86_4g_m7_sse_sseiadd_hplus_load): Ditto. (c86_4g_m7_avx512_ssemul): Adjust reservation units. (c86_4g_m7_avx512_ssemul_load): Ditto. (c86_4g_m7_avx512_ssediv): Remove. (c86_4g_m7_avx512_ssediv_mem): Remove. (c86_4g_m7_avx512_ssediv_x): New. (c86_4g_m7_avx512_ssediv_xmem): New. (c86_4g_m7_avx512_ssediv_y): New. (c86_4g_m7_avx512_ssediv_ymem): New. (c86_4g_m7_avx512_ssediv_z): Adjust reservation units. (c86_4g_m7_avx512_ssediv_zmem): Ditto. (c86_4g_m7_avx512_ssecmp_z): Add reservation units and unit occupancy. (c86_4g_m7_avx512_ssecmp_z_load): Ditto. (c86_4g_m7_avx512_ssecmp_vp_z): New reservation. (c86_4g_m7_avx512_ssecmp_vp_z_load): Ditto. (c86_4g_m7_avx512_ssecmp_test_z): Remove reservation. (c86_4g_m7_avx512_ssecmp_test_z_load): Ditto. (c86_4g_m7_avx512_muladd): Broaden matching condition. (c86_4g_m7_avx512_muladd_load): Ditto. (c86_4g_m7_fma_muladd): Remove reservation. (c86_4g_m7_fma_muladd_load): Ditto. (c86_4g_m7_avx512_sse_conflict_x): Add reservation units and unit occupancy. (c86_4g_m7_avx512_sse_conflict_x_load): Ditto. (c86_4g_m7_avx512_sse_conflict_y): Ditto. (c86_4g_m7_avx512_sse_conflict_y_load): Ditto. (c86_4g_m7_avx512_sse_conflict_z): Ditto. (c86_4g_m7_avx512_sse_conflict_z_load): Ditto. (c86_4g_m7_avx512_sse_class_z): Add reservation units and unit occupancy. (c86_4g_m7_avx512_sse_class_z_load): Ditto. (c86_4g_m7_avx512_sse_sqrt): Remove. (c86_4g_m7_avx512_sse_sqrt_load): Remove. (c86_4g_m7_avx512_sse_sqrt_sf_x): New. (c86_4g_m7_avx512_sse_sqrt_sf_xload): New. (c86_4g_m7_avx512_sse_sqrt_sf_y): New. (c86_4g_m7_avx512_sse_sqrt_sf_yload): New. (c86_4g_m7_avx512_sse_sqrt_sf_z): New. (c86_4g_m7_avx512_sse_sqrt_sf_zload): New. (c86_4g_m7_avx512_sse_sqrt_df_x): New. (c86_4g_m7_avx512_sse_sqrt_df_xload): New. (c86_4g_m7_avx512_sse_sqrt_df_y): New. (c86_4g_m7_avx512_sse_sqrt_df_yload): New. (c86_4g_m7_avx512_sse_sqrt_df_z): New. (c86_4g_m7_avx512_sse_sqrt_df_zload): New. (c86_4g_m7_avx512_msklog_vector): Add reservation units and unit occupancy. (c86_4g_m7_avx512_mskmov_z_k): Ditto. (c86_4g_m7_avx512_mskmov_k_reg): Ditto. * config/i386/c86-4g.md (c86_4g_fp): Remove automaton. (c86_4g_fp024): New automaton. (c86_4g_fp1): Ditto. (c86-4g-fp0): Move to c86_4g_fp024 automaton. (c86-4g-fp1): Move to c86_4g_fp1 automaton. (c86-4g-fp2): Move to c86_4g_fp024 automaton. (c86-4g-fp3): Ditto. (c86-4g-fp1fdivx4): New reservation. (c86_4g_fp_sqrt): Adjust reservation. (c86_4g_sse_sqrt_sf): Ditto. (c86_4g_sse_sqrt_sf_mem): Ditto. (c86_4g_sse_sqrt_df): Ditto. (c86_4g_sse_sqrt_df_mem): Ditto. (c86_4g_fp_op_div): Ditto. (c86_4g_fp_op_div_load): Ditto. (c86_4g_fp_op_idiv_load): Adjust reservation latency. (c86_4g_ssediv_ss_ps): Adjust reservation. (c86_4g_ssediv_ss_ps_load): Ditto. (c86_4g_ssediv_sd_pd): Ditto. (c86_4g_ssediv_sd_pd_load): Ditto. (c86_4g_ssediv_avx256_ps): Ditto. (c86_4g_ssediv_avx256_ps_load): Ditto. (c86_4g_ssediv_avx256_pd): Ditto. (c86_4g_ssediv_avx256_pd_load): Ditto. Co-authored-by: Xin Liu <[email protected]> Signed-off-by: Xin Liu <[email protected]> Signed-off-by: Kewen Lin <[email protected]> Diff: --- gcc/config/i386/c86-4g-m7.md | 412 +++++++++++++++++++++++++------------------ gcc/config/i386/c86-4g.md | 61 ++++--- 2 files changed, 270 insertions(+), 203 deletions(-) diff --git a/gcc/config/i386/c86-4g-m7.md b/gcc/config/i386/c86-4g-m7.md index 54a850db3be8..96bd322a2883 100644 --- a/gcc/config/i386/c86-4g-m7.md +++ b/gcc/config/i386/c86-4g-m7.md @@ -20,8 +20,10 @@ ;; HYGON c86-4g-m7 Scheduling ;; Modeling automatons for decoders, integer execution pipes, ;; AGU pipes, branch, floating point execution, fp store units, -;; integer and floating point dividers. -(define_automaton "c86_4g_m7, c86_4g_m7_ieu, c86_4g_m7_agu, c86_4g_m7_fpu, c86_4g_m7_idiv, c86_4g_m7_fdiv") +;; integer and floating point dividers. Split fpu1 and fpu3 +;; into their own automata to keep these units independent +;; without increasing the main c86_4g_m7_fpu state space. +(define_automaton "c86_4g_m7, c86_4g_m7_ieu, c86_4g_m7_agu, c86_4g_m7_fpu02, c86_4g_m7_fpu13, c86_4g_m7_idiv, c86_4g_m7_fdiv") ;; Decoders unit has 4 decoders and all of them can decode fast path ;; and vector type instructions. @@ -30,10 +32,6 @@ (define_cpu_unit "c86-4g-m7-decode2" "c86_4g_m7") (define_cpu_unit "c86-4g-m7-decode3" "c86_4g_m7") -;; Two separated dividers for int and fp. -(define_cpu_unit "c86-4g-m7-idiv" "c86_4g_m7_idiv") -(define_cpu_unit "c86-4g-m7-fdiv" "c86_4g_m7_fdiv") - ;; Currently blocking all decoders for vector path instructions as ;; they are dispatched separetely as microcode sequence. (define_reservation "c86-4g-m7-vector" "c86-4g-m7-decode0+c86-4g-m7-decode1+c86-4g-m7-decode2+c86-4g-m7-decode3") @@ -50,6 +48,9 @@ (define_cpu_unit "c86-4g-m7-ieu2" "c86_4g_m7_ieu") (define_cpu_unit "c86-4g-m7-ieu3" "c86_4g_m7_ieu") +;; One separated integer divider. +(define_cpu_unit "c86-4g-m7-idiv" "c86_4g_m7_idiv") + ;; c86-4g-m7 has an additional branch unit. (define_cpu_unit "c86-4g-m7-bru0" "c86_4g_m7_ieu") (define_reservation "c86-4g-m7-ieu" "c86-4g-m7-ieu0|c86-4g-m7-ieu1|c86-4g-m7-ieu2|c86-4g-m7-ieu3") @@ -67,23 +68,48 @@ ;; vectorpath (microcoded) instructions are single issue instructions. ;; So, they occupy all the integer units. (define_reservation "c86-4g-m7-ivector" "c86-4g-m7-ieu0+c86-4g-m7-ieu1 - +c86-4g-m7-ieu2+c86-4g-m7-ieu3+c86-4g-m7-bru0 - +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2") + +c86-4g-m7-ieu2+c86-4g-m7-ieu3+c86-4g-m7-bru0 + +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2") ;; Floating point unit 4 FP pipes. -(define_cpu_unit "c86-4g-m7-fpu0" "c86_4g_m7_fpu") -(define_cpu_unit "c86-4g-m7-fpu1" "c86_4g_m7_fpu") -(define_cpu_unit "c86-4g-m7-fpu2" "c86_4g_m7_fpu") -(define_cpu_unit "c86-4g-m7-fpu3" "c86_4g_m7_fpu") +(define_cpu_unit "c86-4g-m7-fpu0" "c86_4g_m7_fpu02") +(define_cpu_unit "c86-4g-m7-fpu1" "c86_4g_m7_fpu13") +(define_cpu_unit "c86-4g-m7-fpu2" "c86_4g_m7_fpu02") +(define_cpu_unit "c86-4g-m7-fpu3" "c86_4g_m7_fpu13") + (define_reservation "c86-4g-m7-fpu" "c86-4g-m7-fpu0|c86-4g-m7-fpu1|c86-4g-m7-fpu2|c86-4g-m7-fpu3") -(define_reservation "c86-4g-m7-fpu_0_2" "c86-4g-m7-fpu0|c86-4g-m7-fpu2") -(define_reservation "c86-4g-m7-fpu_1_3" "c86-4g-m7-fpu1|c86-4g-m7-fpu3") (define_reservation "c86-4g-m7-fpu_0_1" "c86-4g-m7-fpu0|c86-4g-m7-fpu1") +(define_reservation "c86-4g-m7-fpu_0_2" "c86-4g-m7-fpu0|c86-4g-m7-fpu2") (define_reservation "c86-4g-m7-fpu_0_2x2" "c86-4g-m7-fpu0*2|c86-4g-m7-fpu2*2") (define_reservation "c86-4g-m7-fpu_0_2x4" "c86-4g-m7-fpu0*4|c86-4g-m7-fpu2*4") +(define_reservation "c86-4g-m7-fpu_0_3" "c86-4g-m7-fpu0|c86-4g-m7-fpu3") +(define_reservation "c86-4g-m7-fpu_1_3" "c86-4g-m7-fpu1|c86-4g-m7-fpu3") +(define_reservation "c86-4g-m7-fpu_1_3x2" "c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") +(define_reservation "c86-4g-m7-fpu_1_3x3" "c86-4g-m7-fpu1*3|c86-4g-m7-fpu3*3") +(define_reservation "c86-4g-m7-fpu_1_3x6" "c86-4g-m7-fpu1*6|c86-4g-m7-fpu3*6") +(define_reservation "c86-4g-m7-fpux2" "c86-4g-m7-fpu0*2|c86-4g-m7-fpu1*2|c86-4g-m7-fpu2*2|c86-4g-m7-fpu3*2") +(define_reservation "c86-4g-m7-fpux4" "c86-4g-m7-fpu0*4|c86-4g-m7-fpu1*4|c86-4g-m7-fpu2*4|c86-4g-m7-fpu3*4") +(define_reservation "c86-4g-m7-fpux8" "c86-4g-m7-fpu0*8|c86-4g-m7-fpu1*8|c86-4g-m7-fpu2*8|c86-4g-m7-fpu3*8") +(define_reservation "c86-4g-m7-fpux6" "c86-4g-m7-fpu0*6|c86-4g-m7-fpu1*6|c86-4g-m7-fpu2*6|c86-4g-m7-fpu3*6") +(define_reservation "c86-4g-m7-fpux16" "c86-4g-m7-fpu0*16|c86-4g-m7-fpu1*16|c86-4g-m7-fpu2*16|c86-4g-m7-fpu3*16") (define_reservation "c86-4g-m7-fvector" "c86-4g-m7-fpu0+c86-4g-m7-fpu1 - +c86-4g-m7-fpu2+c86-4g-m7-fpu3 - +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2") + +c86-4g-m7-fpu2+c86-4g-m7-fpu3 + +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2") + +;; Two FP dividers. +(define_cpu_unit "c86-4g-m7-fdiv1" "c86_4g_m7_fdiv") +(define_cpu_unit "c86-4g-m7-fdiv3" "c86_4g_m7_fdiv") + +(define_reservation "c86-4g-m7-fp1fdiv1x4" "(c86-4g-m7-fpu1+c86-4g-m7-fdiv1)*4") +(define_reservation "c86-4g-m7-fp3fdiv3x4" "(c86-4g-m7-fpu3+c86-4g-m7-fdiv3)*4") +(define_reservation "c86-4g-m7-fdiv13" "(c86-4g-m7-fdiv1+c86-4g-m7-fdiv3)") +(define_reservation "c86-4g-m7-fp13div13" "(c86-4g-m7-fpu1+c86-4g-m7-fpu3+c86-4g-m7-fdiv1+c86-4g-m7-fdiv3)") +(define_reservation "c86-4g-m7-fp13div13x4" "c86-4g-m7-fp13div13*4") +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x8" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*8)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*8)") +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x9" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*9)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*9)") +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x11" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*11)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*11)") +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x15" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*15)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*15)") +(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x18" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*18)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*18)") ;; IMOV/IMOVX (define_insn_reservation "c86_4g_m7_imov_xchg" 1 @@ -168,61 +194,33 @@ "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-ieu1") ;; IDIV -(define_insn_reservation "c86_4g_m7_idiv_DI" 41 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "idiv") - (and (eq_attr "mode" "DI") - (eq_attr "memory" "none")))) - "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*41") - -(define_insn_reservation "c86_4g_m7_idiv_SI" 25 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "idiv") - (and (eq_attr "mode" "SI") - (eq_attr "memory" "none")))) - "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*25") - -(define_insn_reservation "c86_4g_m7_idiv_HI" 17 +(define_insn_reservation "c86_4g_m7_idiv" 7 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "idiv") - (and (eq_attr "mode" "HI") + (and (eq_attr "mode" "!QI") (eq_attr "memory" "none")))) - "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*17") + "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*7") -(define_insn_reservation "c86_4g_m7_idiv_QI" 15 +(define_insn_reservation "c86_4g_m7_idiv_QI" 6 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "idiv") (and (eq_attr "mode" "QI") (eq_attr "memory" "none")))) - "c86-4g-m7-direct,c86-4g-m7-ieu3,c86-4g-m7-idiv*15") - -(define_insn_reservation "c86_4g_m7_idiv_DI_load" 45 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "idiv") - (and (eq_attr "mode" "DI") - (eq_attr "memory" "load")))) - "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*41") - -(define_insn_reservation "c86_4g_m7_idiv_SI_load" 29 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "idiv") - (and (eq_attr "mode" "SI") - (eq_attr "memory" "load")))) - "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*25") + "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*6") -(define_insn_reservation "c86_4g_m7_idiv_HI_load" 21 +(define_insn_reservation "c86_4g_m7_idiv_load" 11 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "idiv") - (and (eq_attr "mode" "HI") + (and (eq_attr "mode" "!QI") (eq_attr "memory" "load")))) - "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*17") + "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*7") -(define_insn_reservation "c86_4g_m7_idiv_QI_load" 19 +(define_insn_reservation "c86_4g_m7_idiv_QI_load" 10 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "idiv") (and (eq_attr "mode" "QI") (eq_attr "memory" "load")))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*15") + "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*6") ;; Integer/genaral Instructions (define_insn_reservation "c86_4g_m7_insn" 1 @@ -385,14 +383,14 @@ (and (eq_attr "type" "sseins") (and (eq_attr "memory" "none") (eq_attr "length_immediate" "2")))) - "c86-4g-m7-double,c86-4g-m7-fpu0|c86-4g-m7-fpu3,c86-4g-m7-fpu1") + "c86-4g-m7-double,c86-4g-m7-fpu_0_3,c86-4g-m7-fpu1") (define_insn_reservation "c86_4g_m7_sse_insert" 3 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "sseins") (and (eq_attr "memory" "none") (eq_attr "length_immediate" "!2")))) - "c86-4g-m7-direct,c86-4g-m7-fpu1") + "c86-4g-m7-direct,c86-4g-m7-fpu1*2") ;; FCMOV (define_insn_reservation "c86_4g_m7_fp_cmov" 4 @@ -444,7 +442,7 @@ (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "fpspc") (eq_attr "c86_attr" "sqrt"))) - "c86-4g-m7-direct,c86-4g-m7-fpu1,c86-4g-m7-fdiv*22") + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x18") ;; FPSPC (define_insn_reservation "c86_4g_m7_fp_spc_direct" 5 @@ -487,21 +485,21 @@ (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "fdiv") (eq_attr "memory" "none"))) - "c86-4g-m7-direct,c86-4g-m7-fpu1,c86-4g-m7-fdiv*15") + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x11") (define_insn_reservation "c86_4g_m7_fp_div_load" 22 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "fdiv") (and (eq_attr "fp_int_src" "false") (eq_attr "memory" "!none")))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu1,c86-4g-m7-fdiv*15") + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x11") (define_insn_reservation "c86_4g_m7_fp_idiv_load" 26 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "fdiv") (and (eq_attr "fp_int_src" "true") (eq_attr "memory" "!none")))) - "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu1,c86-4g-m7-fdiv*15") + "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu1*4,c86-4g-m7-fp1div1_fp3div3_x4x11") (define_insn_reservation "c86_4g_m7_fp_fsgn" 1 (and (eq_attr "cpu" "c86_4g_m7") @@ -634,7 +632,7 @@ (and (eq_attr "c86_attr" "insr") (and (eq_attr "prefix" "orig") (eq_attr "memory" "none"))))) - "c86-4g-m7-double,c86-4g-m7-ieu2,c86-4g-m7-fpu_0_1") + "c86-4g-m7-double,c86-4g-m7-ieu2,c86-4g-m7-fpu") (define_insn_reservation "c86_4g_m7_sse_pinsr_reg_load" 3 (and (eq_attr "cpu" "c86_4g_m7") @@ -642,7 +640,7 @@ (and (eq_attr "c86_attr" "insr") (and (eq_attr "prefix" "orig") (eq_attr "memory" "load"))))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_1") + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu") (define_insn_reservation "c86_4g_m7_avx_vpinsr_reg" 2 (and (eq_attr "cpu" "c86_4g_m7") @@ -650,7 +648,7 @@ (and (eq_attr "c86_attr" "insr") (and (eq_attr "prefix" "!orig") (eq_attr "memory" "none"))))) - "c86-4g-m7-double,c86-4g-m7-fpu2*2") + "c86-4g-m7-double,c86-4g-m7-fpu_1_3x2") (define_insn_reservation "c86_4g_m7_avx_vpinsr_reg_load" 8 (and (eq_attr "cpu" "c86_4g_m7") @@ -658,7 +656,7 @@ (and (eq_attr "c86_attr" "insr") (and (eq_attr "prefix" "!orig") (eq_attr "memory" "load"))))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu1|c86-4g-m7-fpu2|c86-4g-m7-fpu3") + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_1_3") ;; PERM (define_insn_reservation "c86_4g_m7_avx512_perm_xmm" 3 @@ -668,8 +666,7 @@ (eq_attr "mode" "V4SF,V2DF,TI")) (and (eq_attr "c86_attr" "perm") (eq_attr "mode" "V8SF,V4DF,TI,OI"))) - (and (eq_attr "prefix" "evex") - (eq_attr "memory" "none"))))) + (eq_attr "memory" "none")))) "c86-4g-m7-direct,c86-4g-m7-fpu_0_2x2") (define_insn_reservation "c86_4g_m7_avx512_perm_xmm_opload" 10 @@ -679,8 +676,7 @@ (eq_attr "mode" "V4SF,V2DF,TI")) (and (eq_attr "c86_attr" "perm") (eq_attr "mode" "V8SF,V4DF,TI,OI"))) - (and (eq_attr "prefix" "evex") - (eq_attr "memory" "load"))))) + (eq_attr "memory" "load")))) "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2x2") (define_insn_reservation "c86_4g_m7_avx512_permi2_ymm" 4 @@ -689,7 +685,7 @@ (and (eq_attr "c86_attr" "perm2") (and (eq_attr "mode" "V8SF,V4DF,OI") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpux4") (define_insn_reservation "c86_4g_m7_avx512_permi2_zmm" 16 (and (eq_attr "cpu" "c86_4g_m7") @@ -697,7 +693,7 @@ (and (eq_attr "c86_attr" "perm2") (and (eq_attr "mode" "V16SF,V8DF,XI") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpux16") (define_insn_reservation "c86_4g_m7_avx512_permi2_ymm_load" 11 (and (eq_attr "cpu" "c86_4g_m7") @@ -705,7 +701,7 @@ (and (eq_attr "c86_attr" "perm2") (and (eq_attr "mode" "V8SF,V4DF,OI") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux4") (define_insn_reservation "c86_4g_m7_avx512_permi2_zmm_load" 23 (and (eq_attr "cpu" "c86_4g_m7") @@ -713,7 +709,7 @@ (and (eq_attr "c86_attr" "perm2") (and (eq_attr "mode" "V16SF,V8DF,XI") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux16") (define_insn_reservation "c86_4g_m7_avx512_perm_zmm_imm" 4 (and (eq_attr "cpu" "c86_4g_m7") @@ -722,7 +718,7 @@ (and (eq_attr "mode" "V16SF,V8DF,XI") (and (match_operand 2 "immediate_operand") (eq_attr "memory" "none")))))) - "c86-4g-m7-direct,c86-4g-m7-fpu_0_2x4") + "c86-4g-m7-direct,c86-4g-m7-fpux4") (define_insn_reservation "c86_4g_m7_avx512_perm_zmm_imm_load" 11 (and (eq_attr "cpu" "c86_4g_m7") @@ -731,7 +727,7 @@ (and (eq_attr "mode" "V16SF,V8DF,XI") (and (match_operand 2 "immediate_operand") (eq_attr "memory" "load")))))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2x4") + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpux4") (define_insn_reservation "c86_4g_m7_avx512_perm_zmm_noimm" 8 (and (eq_attr "cpu" "c86_4g_m7") @@ -740,7 +736,7 @@ (and (eq_attr "mode" "V16SF,V8DF,XI") (and (match_operand 2 "nonimmediate_operand") (eq_attr "memory" "none")))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpux8") (define_insn_reservation "c86_4g_m7_sse_perm_zmm_noimm_load" 15 (and (eq_attr "cpu" "c86_4g_m7") @@ -749,23 +745,7 @@ (and (eq_attr "mode" "V16SF,V8DF,XI") (and (match_operand 2 "nonimmediate_operand") (eq_attr "memory" "load")))))) - "c86-4g-m7-vector,c86-4g-m7-load") - -(define_insn_reservation "c86_4g_m7_avx_perm_ymm" 3 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "sselog") - (and (eq_attr "c86_attr" "perm") - (and (eq_attr "prefix" "!evex") - (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") - -(define_insn_reservation "c86_4g_m7_avx_perm_ymem" 10 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "sselog") - (and (eq_attr "c86_attr" "perm") - (and (eq_attr "prefix" "!evex") - (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux8") ;; VINSERT (define_insn_reservation "c86_4g_m7_avx512_insertx_ymm" 3 @@ -853,7 +833,7 @@ (and (eq_attr "c86_attr" "shufx") (and (eq_attr "mode" "V8DF,V16SF,XI") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpu_0_2x4") (define_insn_reservation "c86_4g_m7_avx512_shuf_xymem" 10 (and (eq_attr "cpu" "c86_4g_m7") @@ -869,7 +849,7 @@ (and (eq_attr "c86_attr" "shufx") (and (eq_attr "mode" "V8DF,V16SF,XI") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_0_2x4") ;; SSELOGIC (define_insn_reservation "c86_4g_m7_sselogic_xymm" 1 @@ -892,14 +872,14 @@ (and (eq_attr "type" "sselog") (and (eq_attr "c86_attr" "cmpestr") (eq_attr "memory" "none")))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpux6") (define_insn_reservation "c86_4g_m7_avx512_cmpestr_load" 13 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "sselog") (and (eq_attr "c86_attr" "cmpestr") (eq_attr "memory" "load")))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux6") ;; SSELOG (define_insn_reservation "c86_4g_m7_avx512_log" 1 @@ -940,7 +920,7 @@ (and (eq_attr "c86_attr" "sadbw") (and (eq_attr "mode" "XI") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3x2") (define_insn_reservation "c86_4g_m7_avx512_vdbpsadbw_zmem" 11 (and (eq_attr "cpu" "c86_4g_m7") @@ -948,7 +928,7 @@ (and (eq_attr "c86_attr" "sadbw") (and (eq_attr "mode" "XI") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3x2") ;; ABS (define_insn_reservation "c86_4g_m7_avx512_abs" 1 @@ -1052,14 +1032,14 @@ (and (eq_attr "type" "ssecomi") (and (eq_attr "prefix_extra" "0") (eq_attr "memory" "none")))) - "c86-4g-m7-double,c86-4g-m7-fpu2|c86-4g-m7-fpu3") + "c86-4g-m7-double,c86-4g-m7-fpu") (define_insn_reservation "c86_4g_m7_avx_ssecomi_comi_load" 8 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssecomi") (and (eq_attr "prefix_extra" "0") (eq_attr "memory" "load")))) - "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu2|c86-4g-m7-fpu3") + "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu") (define_insn_reservation "c86_4g_m7_avx_ssecomi_test" 1 (and (eq_attr "cpu" "c86_4g_m7") @@ -1201,7 +1181,7 @@ (and (eq_attr "c86_attr" "expand,compress") (and (not (eq_attr "mode" "XI,V16SF,V8DF")) (eq_attr "memory" "none"))))) - "c86-4g-m7-direct,c86-4g-m7-fpu3*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") + "c86-4g-m7-direct,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3") (define_insn_reservation "c86_4g_m7_avx512_expand_load" 10 (and (eq_attr "cpu" "c86_4g_m7") @@ -1209,7 +1189,7 @@ (and (eq_attr "c86_attr" "expand,compress") (and (not (eq_attr "mode" "XI,V16SF,V8DF")) (eq_attr "memory" "load"))))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3") (define_insn_reservation "c86_4g_m7_avx512_expand_z" 10 (and (eq_attr "cpu" "c86_4g_m7") @@ -1217,7 +1197,7 @@ (and (eq_attr "c86_attr" "expand,compress") (and (eq_attr "mode" "XI,V16SF,V8DF") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3") (define_insn_reservation "c86_4g_m7_avx512_expand_z_load" 17 (and (eq_attr "cpu" "c86_4g_m7") @@ -1225,7 +1205,7 @@ (and (eq_attr "c86_attr" "expand,compress") (and (eq_attr "mode" "XI,V16SF,V8DF") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3") ;; MOVNT (define_insn_reservation "c86_4g_m7_avx512_movnt_load" 8 @@ -1252,7 +1232,7 @@ (eq_attr "memory" "!none"))))) "c86-4g-m7-direct,c86-4g-m7-store,c86-4g-m7-fpu1") -(define_insn_reservation "c86_4g_m7_sse_movnt_xy" 4 +(define_insn_reservation "c86_4g_m7_sse_movnt" 4 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssemov") (and (eq_attr "c86_attr" "movnt") @@ -1377,14 +1357,14 @@ (and (eq_attr "type" "sseadd") (and (eq_attr "c86_attr" "other") (eq_attr "memory" "none")))) - "c86-4g-m7-direct,c86-4g-m7-fpu3") + "c86-4g-m7-direct,c86-4g-m7-fpu_1_3") (define_insn_reservation "c86_4g_m7_avx512_sseadd_xy_load" 10 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "sseadd") (and (eq_attr "c86_attr" "other") (eq_attr "memory" "load")))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3") + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_1_3") ;; HADD/HSUB (define_insn_reservation "c86_4g_m7_avx_sseadd_hplus" 7 @@ -1507,7 +1487,7 @@ (and (eq_attr "c86_attr" "hplus") (and (eq_attr "prefix" "orig") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector,c86-4g-m7-fpu0*2") + "c86-4g-m7-vector,c86-4g-m7-fpux2") (define_insn_reservation "c86_4g_m7_sse_sseiadd_hplus_load" 10 (and (eq_attr "cpu" "c86_4g_m7") @@ -1515,49 +1495,63 @@ (and (eq_attr "c86_attr" "hplus") (and (eq_attr "prefix" "orig") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu0*2") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux2") ;; SSEMUL (define_insn_reservation "c86_4g_m7_avx512_ssemul" 3 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssemul") (eq_attr "memory" "none"))) - "c86-4g-m7-direct,c86-4g-m7-fpu0") + "c86-4g-m7-direct,c86-4g-m7-fpu_0_2") (define_insn_reservation "c86_4g_m7_avx512_ssemul_load" 10 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssemul") (eq_attr "memory" "load"))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu0") + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2") ;; SSEDIV -(define_insn_reservation "c86_4g_m7_avx512_ssediv" 13 +(define_insn_reservation "c86_4g_m7_avx512_ssediv_x" 13 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "SF,DF,V4SF,V2DF") + (eq_attr "memory" "none")))) + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x8") + +(define_insn_reservation "c86_4g_m7_avx512_ssediv_xmem" 20 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "ssediv") + (and (eq_attr "mode" "SF,DF,V4SF,V2DF") + (eq_attr "memory" "load")))) + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x8") + +(define_insn_reservation "c86_4g_m7_avx512_ssediv_y" 13 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssediv") - (and (not (eq_attr "mode" "V16SF,V8DF")) + (and (eq_attr "mode" "V8SF,V4DF") (eq_attr "memory" "none")))) - "c86-4g-m7-direct,c86-4g-m7-fpu3,c86-4g-m7-fdiv*13") + "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*8") -(define_insn_reservation "c86_4g_m7_avx512_ssediv_mem" 20 +(define_insn_reservation "c86_4g_m7_avx512_ssediv_ymem" 20 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssediv") - (and (not (eq_attr "mode" "V16SF,V8DF")) + (and (eq_attr "mode" "V8SF,V4DF") (eq_attr "memory" "load")))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fdiv*13") + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*8") (define_insn_reservation "c86_4g_m7_avx512_ssediv_z" 24 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssediv") (and (eq_attr "mode" "V16SF,V8DF") (eq_attr "memory" "none")))) - "c86-4g-m7-double,c86-4g-m7-fpu3,c86-4g-m7-fdiv*24") + "c86-4g-m7-double,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*20") (define_insn_reservation "c86_4g_m7_avx512_ssediv_zmem" 31 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssediv") (and (eq_attr "mode" "V16SF,V8DF") (eq_attr "memory" "load")))) - "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fdiv*24") + "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*20") ;; SSECMP (define_insn_reservation "c86_4g_m7_avx512_ssecmp" 5 @@ -1582,7 +1576,7 @@ (and (eq_attr "mode" "V16SF,V8DF,XI") (and (eq_attr "c86_attr" "other") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3") (define_insn_reservation "c86_4g_m7_avx512_ssecmp_z_load" 12 (and (eq_attr "cpu" "c86_4g_m7") @@ -1590,7 +1584,7 @@ (and (eq_attr "mode" "V16SF,V8DF,XI") (and (eq_attr "c86_attr" "other") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3x2") (define_insn_reservation "c86_4g_m7_avx512_ssecmp_vp" 5 (and (eq_attr "cpu" "c86_4g_m7") @@ -1610,6 +1604,24 @@ (eq_attr "memory" "load")))))) "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu,c86-4g-m7-fpu_1_3") +(define_insn_reservation "c86_4g_m7_avx512_ssecmp_vp_z" 5 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "prefix" "evex") + (and (eq_attr "mode" "XI") + (and (eq_attr "c86_attr" "other,ptest") + (eq_attr "memory" "none")))))) + "c86-4g-m7-double,c86-4g-m7-fpu,c86-4g-m7-fpu_1_3") + +(define_insn_reservation "c86_4g_m7_avx512_ssecmp_vp_z_load" 12 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "ssecmp") + (and (eq_attr "prefix" "evex") + (and (eq_attr "mode" "XI") + (and (eq_attr "c86_attr" "other,ptest") + (eq_attr "memory" "load")))))) + "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu,c86-4g-m7-fpu_1_3x2") + (define_insn_reservation "c86_4g_m7_avx_ssecmp_vp" 1 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssecmp") @@ -1641,22 +1653,6 @@ (eq_attr "memory" "load"))))) "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu1,c86-4g-m7-fpu_1_3") -(define_insn_reservation "c86_4g_m7_avx512_ssecmp_test_z" 4 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "ssecmp") - (and (eq_attr "mode" "XI") - (and (eq_attr "c86_attr" "ptest") - (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") - -(define_insn_reservation "c86_4g_m7_avx512_ssecmp_test_z_load" 11 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "ssecmp") - (and (eq_attr "mode" "XI") - (and (eq_attr "c86_attr" "ptest") - (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") - ;; SSECVT (define_insn_reservation "c86_4g_m7_avx512_ssecvt_xy" 4 (and (eq_attr "cpu" "c86_4g_m7") @@ -1768,17 +1764,14 @@ (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssemuladd") (and (eq_attr "c86_attr" "other") - (and (not (eq_attr "isa" "fma,fma4")) - (eq_attr "mode" "V32HF,V16SF,V8DF,XI") - (eq_attr "memory" "none"))))) + (eq_attr "memory" "none")))) "c86-4g-m7-direct,c86-4g-m7-fpu_0_2") (define_insn_reservation "c86_4g_m7_avx512_muladd_load" 11 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "ssemuladd") (and (eq_attr "c86_attr" "other") - (and (not (eq_attr "isa" "fma,fma4")) - (eq_attr "memory" "load"))))) + (eq_attr "memory" "load")))) "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2") (define_insn_reservation "c86_4g_m7_avx512_muladd_madd" 4 @@ -1797,20 +1790,6 @@ (eq_attr "memory" "load"))))) "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2") -(define_insn_reservation "c86_4g_m7_fma_muladd" 4 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "ssemuladd") - (and (eq_attr "isa" "fma,fma4") - (eq_attr "memory" "none")))) - "c86-4g-m7-direct,c86-4g-m7-fpu_0_1") - -(define_insn_reservation "c86_4g_m7_fma_muladd_load" 11 - (and (eq_attr "cpu" "c86_4g_m7") - (and (eq_attr "type" "ssemuladd") - (and (eq_attr "isa" "fma,fma4") - (eq_attr "memory" "load")))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_1") - ;; SSE (define_insn_reservation "c86_4g_m7_avx512_sse_range" 1 (and (eq_attr "cpu" "c86_4g_m7") @@ -1838,7 +1817,7 @@ (and (eq_attr "c86_decode" "vector") (and (eq_attr "mode" "TI") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpu_1_3x2") (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_x_load" 9 (and (eq_attr "cpu" "c86_4g_m7") @@ -1846,7 +1825,7 @@ (and (eq_attr "c86_decode" "vector") (and (eq_attr "mode" "TI") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3x2") (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_y" 5 (and (eq_attr "cpu" "c86_4g_m7") @@ -1854,7 +1833,7 @@ (and (eq_attr "c86_decode" "vector") (and (eq_attr "mode" "OI") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpu_1_3x3") (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_y_load" 12 (and (eq_attr "cpu" "c86_4g_m7") @@ -1862,7 +1841,7 @@ (and (eq_attr "c86_decode" "vector") (and (eq_attr "mode" "OI") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3x3") (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_z" 8 (and (eq_attr "cpu" "c86_4g_m7") @@ -1870,7 +1849,7 @@ (and (eq_attr "c86_decode" "vector") (and (eq_attr "mode" "XI") (eq_attr "memory" "none"))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpu_1_3x6") (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_z_load" 15 (and (eq_attr "cpu" "c86_4g_m7") @@ -1878,7 +1857,7 @@ (and (eq_attr "c86_decode" "vector") (and (eq_attr "mode" "XI") (eq_attr "memory" "load"))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3x6") (define_insn_reservation "c86_4g_m7_avx512_sse_class" 4 (and (eq_attr "cpu" "c86_4g_m7") @@ -1905,7 +1884,7 @@ (and (eq_attr "length_immediate" "1") (and (eq_attr "mode" "V32HF,V16SF,V8DF") (eq_attr "memory" "none")))))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpu_1_3,c86-4g-m7-fpu_1_3") (define_insn_reservation "c86_4g_m7_avx512_sse_class_z_load" 11 (and (eq_attr "cpu" "c86_4g_m7") @@ -1914,7 +1893,7 @@ (and (eq_attr "length_immediate" "1") (and (eq_attr "mode" "V32HF,V16SF,V8DF") (eq_attr "memory" "load")))))) - "c86-4g-m7-vector,c86-4g-m7-load") + "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3,c86-4g-m7-fpu_1_3") (define_insn_reservation "c86_4g_m7_avx_sse" 5 (and (eq_attr "cpu" "c86_4g_m7") @@ -1932,19 +1911,102 @@ (eq_attr "memory" "load"))))) "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_1") -(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt" 16 +;; SSE SQRT +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_x" 14 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "sse") - (and (eq_attr "c86_attr" "sqrt") - (eq_attr "memory" "none")))) - "c86-4g-m7-direct,c86-4g-m7-fpu1|c86-4g-m7-fpu3,c86-4g-m7-fdiv*16") + (and (eq_attr "mode" "SF,V4SF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "none"))))) + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x9") -(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_load" 23 +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_xload" 21 (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "sse") - (and (eq_attr "c86_attr" "sqrt") - (eq_attr "memory" "load")))) - "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu1|c86-4g-m7-fpu3,c86-4g-m7-fdiv*16") + (and (eq_attr "mode" "SF,V4SF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "load"))))) + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x9") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_y" 14 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "V8SF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "none"))))) + "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*9") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_yload" 21 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "V8SF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "load"))))) + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*9") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_z" 26 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "V16SF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "none"))))) + "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*22") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_zload" 33 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "V16SF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "load"))))) + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*22") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_x" 20 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "DF,V2DF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "none"))))) + "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x15") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_xload" 27 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "DF,V2DF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "load"))))) + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x15") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_y" 20 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "V4DF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "none"))))) + "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*15") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_yload" 27 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "V4DF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "load"))))) + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*15") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_z" 38 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "V8DF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "none"))))) + "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*34") + +(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_zload" 45 + (and (eq_attr "cpu" "c86_4g_m7") + (and (eq_attr "type" "sse") + (and (eq_attr "mode" "V8DF") + (and (eq_attr "c86_attr" "sqrt") + (eq_attr "memory" "load"))))) + "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*34") ;; MSKLOG/MSKMOV (define_insn_reservation "c86_4g_m7_avx512_msklog" 1 @@ -1957,7 +2019,7 @@ (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "msklog") (eq_attr "c86_decode" "vector"))) - "c86-4g-m7-vector") + "c86-4g-m7-vector,c86-4g-m7-fpu_1_3") (define_insn_reservation "c86_4g_m7_avx512_mskmov_reg_k" 1 (and (eq_attr "cpu" "c86_4g_m7") @@ -1977,7 +2039,7 @@ (and (eq_attr "cpu" "c86_4g_m7") (and (eq_attr "type" "mskmov") (match_operand:V8DI 0 "register_operand" "v"))) - "c86-4g-m7-vector,c86-4g-m7-fpu3*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") + "c86-4g-m7-vector,c86-4g-m7-fpu3,c86-4g-m7-fpu_1_3") (define_insn_reservation "c86_4g_m7_avx512_mskmov_k_k" 1 (and (eq_attr "cpu" "c86_4g_m7") @@ -1991,7 +2053,7 @@ (and (eq_attr "type" "mskmov") (and (match_operand 0 "register_operand" "k") (match_operand 1 "register_operand" "r")))) - "c86-4g-m7-double,c86-4g-m7-fpu1*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2") + "c86-4g-m7-double,c86-4g-m7-fpu1,c86-4g-m7-fpu_1_3") (define_insn_reservation "c86_4g_m7_avx512_mskmov_k_m" 8 (and (eq_attr "cpu" "c86_4g_m7") diff --git a/gcc/config/i386/c86-4g.md b/gcc/config/i386/c86-4g.md index 49a46a8aa19e..8b81fcaabb28 100644 --- a/gcc/config/i386/c86-4g.md +++ b/gcc/config/i386/c86-4g.md @@ -30,8 +30,10 @@ ;; HYGON Scheduling ;; Modeling automatons for decoders, integer execution pipes, ;; AGU pipes, floating point execution units, integer and -;; floating point dividers. -(define_automaton "c86_4g, c86_4g_ieu, c86_4g_fp, c86_4g_agu, c86_4g_idiv, c86_4g_fdiv") +;; floating point dividers. Split fp1 into its own automaton +;; to keep this unit independent without increasing the main +;; c86_4g_fp state space. +(define_automaton "c86_4g, c86_4g_ieu, c86_4g_fp024, c86_4g_fp1, c86_4g_agu, c86_4g_idiv, c86_4g_fdiv") ;; Decoders unit has 4 decoders and all of them can decode fast path ;; and vector type instructions. @@ -40,10 +42,6 @@ (define_cpu_unit "c86-4g-decode2" "c86_4g") (define_cpu_unit "c86-4g-decode3" "c86_4g") -;; Two separated dividers for int and fp. -(define_cpu_unit "c86-4g-idiv" "c86_4g_idiv") -(define_cpu_unit "c86-4g-fdiv" "c86_4g_fdiv") - ;; Currently blocking all decoders for vector path instructions as ;; they are dispatched separetely as microcode sequence. ;; Fix me: Need to revisit this. @@ -55,7 +53,6 @@ ;; Fix me: Need to revisit this later to simulate fast path double behavior. (define_reservation "c86-4g-double" "c86-4g-direct") - ;; Integer unit 4 ALU pipes. (define_cpu_unit "c86-4g-ieu0" "c86_4g_ieu") (define_cpu_unit "c86-4g-ieu1" "c86_4g_ieu") @@ -63,6 +60,9 @@ (define_cpu_unit "c86-4g-ieu3" "c86_4g_ieu") (define_reservation "c86-4g-ieu" "c86-4g-ieu0|c86-4g-ieu1|c86-4g-ieu2|c86-4g-ieu3") +;; One separated integer divider. +(define_cpu_unit "c86-4g-idiv" "c86_4g_idiv") + ;; 2 AGU pipes in c86_4g ;; According to CPU diagram last AGU unit is used only for stores. (define_cpu_unit "c86-4g-agu0" "c86_4g_agu") @@ -81,10 +81,10 @@ +c86-4g-agu0+c86-4g-agu1") ;; Floating point unit 4 FP pipes. -(define_cpu_unit "c86-4g-fp0" "c86_4g_fp") -(define_cpu_unit "c86-4g-fp1" "c86_4g_fp") -(define_cpu_unit "c86-4g-fp2" "c86_4g_fp") -(define_cpu_unit "c86-4g-fp3" "c86_4g_fp") +(define_cpu_unit "c86-4g-fp0" "c86_4g_fp024") +(define_cpu_unit "c86-4g-fp1" "c86_4g_fp1") +(define_cpu_unit "c86-4g-fp2" "c86_4g_fp024") +(define_cpu_unit "c86-4g-fp3" "c86_4g_fp024") (define_reservation "c86-4g-fpu" "c86-4g-fp0|c86-4g-fp1|c86-4g-fp2|c86-4g-fp3") @@ -92,6 +92,11 @@ +c86-4g-fp2+c86-4g-fp3 +c86-4g-agu0+c86-4g-agu1") +;; One separated FP divider. +(define_cpu_unit "c86-4g-fdiv" "c86_4g_fdiv") + +(define_reservation "c86-4g-fp1fdivx4" "(c86-4g-fp1+c86-4g-fdiv)*4") + ;; Call instruction (define_insn_reservation "c86_4g_call" 1 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") @@ -387,7 +392,7 @@ (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (and (eq_attr "type" "fpspc") (eq_attr "c86_attr" "sqrt"))) - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*22") + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*18") (define_insn_reservation "c86_4g_sse_sqrt_sf" 14 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") @@ -395,7 +400,7 @@ (and (eq_attr "memory" "none,unknown") (and (eq_attr "c86_attr" "sqrt") (eq_attr "type" "sse"))))) - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*14") + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*10") (define_insn_reservation "c86_4g_sse_sqrt_sf_mem" 21 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") @@ -403,7 +408,7 @@ (and (eq_attr "memory" "load") (and (eq_attr "c86_attr" "sqrt") (eq_attr "type" "sse"))))) - "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*14") + "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*10") (define_insn_reservation "c86_4g_sse_sqrt_df" 20 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") @@ -411,7 +416,7 @@ (and (eq_attr "memory" "none,unknown") (and (eq_attr "c86_attr" "sqrt") (eq_attr "type" "sse"))))) - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*20") + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*16") (define_insn_reservation "c86_4g_sse_sqrt_df_mem" 27 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") @@ -419,7 +424,7 @@ (and (eq_attr "memory" "load") (and (eq_attr "c86_attr" "sqrt") (eq_attr "type" "sse"))))) - "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*20") + "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*16") ;; RCP (define_insn_reservation "c86_4g_sse_rcp" 5 @@ -492,20 +497,20 @@ (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (and (eq_attr "type" "fdiv") (eq_attr "memory" "none"))) - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*15") + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*11") (define_insn_reservation "c86_4g_fp_op_div_load" 22 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (and (eq_attr "type" "fdiv") (eq_attr "memory" "load"))) - "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*15") + "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*11") -(define_insn_reservation "c86_4g_fp_op_idiv_load" 27 +(define_insn_reservation "c86_4g_fp_op_idiv_load" 26 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (and (eq_attr "type" "fdiv") (and (eq_attr "fp_int_src" "true") (eq_attr "memory" "load")))) - "c86-4g-double,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*19") + "c86-4g-double,c86-4g-load,c86-4g-fp1*4,c86-4g-fp1fdivx4,c86-4g-fdiv*11") ;; MMX, SSE, SSEn.n, AVX, AVX2 instructions (define_insn_reservation "c86_4g_fp_insn" 1 @@ -1024,28 +1029,28 @@ (eq_attr "mode" "V4SF,SF")) (and (eq_attr "type" "ssediv") (eq_attr "memory" "none"))) - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*10") + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*6") (define_insn_reservation "c86_4g_ssediv_ss_ps_load" 17 (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (eq_attr "mode" "V4SF,SF")) (and (eq_attr "type" "ssediv") (eq_attr "memory" "load"))) - "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*10") + "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*6") (define_insn_reservation "c86_4g_ssediv_sd_pd" 13 (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (eq_attr "mode" "V2DF,DF")) (and (eq_attr "type" "ssediv") (eq_attr "memory" "none"))) - "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*13") + "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*9") (define_insn_reservation "c86_4g_ssediv_sd_pd_load" 20 (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (eq_attr "mode" "V2DF,DF")) (and (eq_attr "type" "ssediv") (eq_attr "memory" "load"))) - "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*13") + "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*9") (define_insn_reservation "c86_4g_ssediv_avx256_ps" 10 @@ -1053,28 +1058,28 @@ (and (eq_attr "mode" "V8SF") (and (eq_attr "memory" "none") (eq_attr "type" "ssediv")))) - "c86-4g-double,c86-4g-fp1,c86-4g-fdiv*10") + "c86-4g-double,c86-4g-fp1fdivx4,c86-4g-fdiv*6") (define_insn_reservation "c86_4g_ssediv_avx256_ps_load" 17 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (and (eq_attr "mode" "V8SF") (and (eq_attr "type" "ssediv") (eq_attr "memory" "load")))) - "c86-4g-double,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*10") + "c86-4g-double,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*6") (define_insn_reservation "c86_4g_ssediv_avx256_pd" 13 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (and (eq_attr "mode" "V4DF") (and (eq_attr "type" "ssediv") (eq_attr "memory" "none")))) - "c86-4g-double,c86-4g-fp1,c86-4g-fdiv*13") + "c86-4g-double,c86-4g-fp1fdivx4,c86-4g-fdiv*9") (define_insn_reservation "c86_4g_ssediv_avx256_pd_load" 20 (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6") (and (eq_attr "mode" "V4DF") (and (eq_attr "type" "ssediv") (eq_attr "memory" "load")))) - "c86-4g-double,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*13") + "c86-4g-double,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*9") ;; SSE MUL (define_insn_reservation "c86_4g_ssemul_ss_ps" 3 (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
