On Mon, 21 Nov 2022, Joshi, Tejas Sanjay wrote:
> I have addressed all your comments in the patch attached here. I have also > used znver4-direct for avx512 insns. Thanks. > * This patch increased the insn-automata.cc size from 201502 to 214902. Assuming it's the number of lines of code, I have 102847, perhaps you're measuring without my patches? You can use 'size -A gcc/insn-automata.o' to measure binary size growth. > * Compile time and binary size on my machine remains same. > * Make check and bootstrap build have no issues. > * Spec cpu2017 also don't have any issues with this patch. > > Is this ok for trunk? I cannot approve or reject your patch, this is up to Honza who I believe was investigating if combining this with older Zen models makes sense. In the meantime, I see a few more issues that can be easily corrected, please see below. > --- /dev/null > +++ b/gcc/config/i386/znver4.md > +;; FSQRT > +(define_insn_reservation "znver4_fsqrt" 22 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fpspc") > + (and (eq_attr "mode" "XF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*20") This should be znver4-fdiv*10 (not *20) according to Agner Fog's measurements. > +;; FDIV > +(define_insn_reservation "znver4_fp_div" 15 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "fdiv") > + (eq_attr "memory" "none"))) > + "znver4-direct,znver4-fdiv*15") znver4-fdiv*6 instead of *15 here and in two patterns following this one. > +(define_insn_reservation "znver4_sse_div_pd" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V4DF,V2DF,V1DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*7") Agner Fog's measurements indicate fdiv*5 here. > + > +(define_insn_reservation "znver4_sse_div_ps" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*5") Agner Fog's measurements indicate fdiv*3 here. > + > +(define_insn_reservation "znver4_sse_div_pd_load" 20 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V4DF,V2DF,V1DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*7") fdiv*5? > + > +(define_insn_reservation "znver4_sse_div_ps_load" 17 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8SF,V4SF,V2SF,SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*5") fdiv*3? > +(define_insn_reservation "znver4_sse_div_pd_evex" 13 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8DF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*7") This should be twice as much as the corresponding SSE/AVX instruction (fdiv*14 or fdiv*10; Agner Fog measured 9 cycles as reciprocal throughput). > + > +(define_insn_reservation "znver4_sse_div_ps_evex" 10 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V16SF") > + (eq_attr "memory" "none")))) > + "znver4-direct,znver4-fdiv*5") Likewise (fdiv*6). > +(define_insn_reservation "znver4_sse_div_pd_evex_load" 20 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V8DF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*7") Likewise. > +(define_insn_reservation "znver4_sse_div_ps_evex_load" 17 > + (and (eq_attr "cpu" "znver4") > + (and (eq_attr "type" "ssediv") > + (and (eq_attr "mode" "V16SF") > + (eq_attr "memory" "load")))) > + "znver4-direct,znver4-load,znver4-fdiv*5") Likewise. Alexander