On Thu, Jan 12, 2017 at 03:43:52AM +0000, Hurugalawadi, Naveen wrote: > Hi James, > > The scheduling patch for vulcan was posted at the following link:- > https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01205.html > > We are working on the patch and addressed the comments for thunderx2t99.
Great, thanks. > > >> I tried lowering the repeat expressions as so: > Done. > > >>split off the AdvSIMD/FP model from the main pipeline > Done. > > >> A change like wiring the vulcan_f0 and vulcan_f1 reservations > >> to be cpu_units of a new define_automaton "vulcan_advsimd" > Done. Perfect, the automaton size is much more palatable now. Automaton `thunderx2t99' 184 NDFA states, 838 NDFA arcs 184 DFA states, 838 DFA arcs 184 minimal DFA states, 838 minimal DFA arcs 370 all insns 8 insn equivalence classes 0 locked states 1016 transition comb vector els, 1472 trans table els: use simple vect 1472 min delay table els, compression factor 4 Automaton `thunderx2t99_advsimd' 12231 NDFA states, 85833 NDFA arcs 12231 DFA states, 85833 DFA arcs 9246 minimal DFA states, 66554 minimal DFA arcs 370 all insns 11 insn equivalence classes 0 locked states 84074 transition comb vector els, 101706 trans table els: use simple vect 101706 min delay table els, compression factor 2 Automaton `thunderx2t99_ldst' 49 NDFA states, 193 NDFA arcs 49 DFA states, 193 DFA arcs 16 minimal DFA states, 94 minimal DFA arcs 370 all insns 13 insn equivalence classes 0 locked states 91 transition comb vector els, 208 trans table els: use simple vect 208 min delay table els, compression factor 2 Automaton `thunderx2t99_mult' 2 NDFA states, 5 NDFA arcs 2 DFA states, 5 DFA arcs 2 minimal DFA states, 5 minimal DFA arcs 370 all insns 3 insn equivalence classes 0 locked states 6 transition comb vector els, 6 trans table els: use simple vect 6 min delay table els, compression factor 8 > > >> simplifying some of the remaining large expressions > >> (vulcan_asimd_load*_mult, vulcan_asimd_load*_elts) can bring the size down > Did not understand much about this comment. > Can you please let me know about the simplification? I wonder whether the current modeling of: (define_insn_reservation "thunderx2t99_asimd_load4_elts" 6 as: "thunderx2t99_ls_both*2,(thunderx2t99_ls0d1+thunderx2t99_ls1d1),\ (thunderx2t99_ls0d2+thunderx2t99_ls1d2),\ (thunderx2t99_ls0d3+thunderx2t99_ls1d3),thunderx2t99_f01") Actually benefits the schedule in a meaningful way, or if it just increases the size of the automaton. My guess is that given how many approximations scheduling makes as to the actual working of the machine (e.g. always perfect alignment, no cache misses, no other reason to restart loads) there is only a very small benefit to accurately describing the flow of an instruction through the pipelines in this way. The size of the automaton is more reasonable now, so I won't insist on changing it, but even with your changes the thunderx2t99 model is still the largest automaton we're building. > Please find attached the modified patch as per your suggestions and comments. > Please review the patch and let us know if its okay? I'd like for the size to come down again, or at least for you to comment on whether the modelling you do for thunderx2t99_asimd_load*_mult and thunderx2t99_asimd_load*_elts can be justified by an improved schedule. > diff --git a/gcc/config/aarch64/aarch64-cores.def > b/gcc/config/aarch64/aarch64-cores.def > index a7a4b33..4d39673 100644 > --- a/gcc/config/aarch64/aarch64-cores.def > +++ b/gcc/config/aarch64/aarch64-cores.def > @@ -75,7 +75,7 @@ AARCH64_CORE("xgene1", xgene1, xgene1, 8A, > AARCH64_FL_FOR_ARCH8, xge > > /* Broadcom ('B') cores. */ > AARCH64_CORE("thunderx2t99", thunderx2t99, cortexa57, 8_1A, > AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1) You'll want to update this to use your new scheduling model :-). > -AARCH64_CORE("vulcan", vulcan, cortexa57, 8_1A, AARCH64_FL_FOR_ARCH8_1 | > AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1) > +AARCH64_CORE("vulcan", vulcan, vulcan, 8_1A, AARCH64_FL_FOR_ARCH8_1 | > AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1) This line is incorrect, you should be changing vulcan to use the new thunderx2t99 model. The tune attribute "vulcan" won't match your new model, so with this change you'd get "generic" (i.e. no) scheduling for -mcpu=vulcan . Otherwise, I don't have any concerns with this patch but it must be retested as the new scheduling model can never be used with this patch version. Thanks, James