On Wed, Oct 28, 2015 at 6:36 PM, James Greenhalgh <james.greenha...@arm.com> wrote: > On Tue, Oct 27, 2015 at 06:12:48PM -0500, Evandro Menezes wrote: >> This patch adds the scheduling and cost models for Exynos M1. >> >> Though it?s a rather large patch, much of it is the DFA model for the >> pipeline.? Still, I?d appreciate any feedback. >> >> Please, commit if it?s alright. > > Hi Evandro, > > Thanks for the patch, I have some comments. > > To ease review, could I ask you to turn this in to a patch series? Roughly > structured as so: > > 1/4: Add the Exynos-M1 cost models. > 2/4: Add the Exynos M1 scheduling model. > 3/4: Add the infrastructure for TARGET_CASE_VALUES_THRESHOLD. > 4/4: Add the extra tuning heuristics. > > Your support is missing a critical hunk for AArch64, there should be an > > (include "../arm/exynos-m1.md") > > in aarch64.md to get this working. > > This is a fairly large pipeline description (add (automata_option "stats") > to the .md file): > > Automaton `exynos_m1' > 62320 NDFA states, 489094 NDFA arcs > > From experience, you get little benefit from such a complex model, but you > do slow bootstrap times. It isn't for me to say where the model can be > trimmed (I don't have access to documentation for the Exynos-M1), but > you may find it useful to split out the SIMD/FP automaton, and look at whether > your modelling of long latency instructions is entirely neccesary. Have a > look at the Cortex-A57 and Cortex-A53 for some examples of what I mean. > > For comparison, here are the stats for Cortex-A53 and Cortex-A57: > > Automaton `cortex_a53' > 281 NDFA states, 1158 NDFA arcs > Automaton `cortex_a53_advsimd' > 9072 NDFA states, 49572 NDFA arcs > Automaton `cortex_a57' > 764 NDFA states, 3600 NDFA arcs > Automaton `cortex_a57_cx' > 204 NDFA states, 864 NDFA arcs > >> @@ -7672,6 +7737,22 @@ aarch64_override_options_internal (struct gcc_options >> *opts) >> opts->x_param_values, >> global_options_set.x_param_values); >> >> + /* Adjust the heuristics for Exynos M1. */ >> + if (selected_cpu->sched_core == exynosm1) > > I think it would be preferable to pull these tuning parameters in to > the target structures somehow, rather than guarding them off by specific > CPUs. > >> + { >> + /* Increase the maximum peeling limit. */ >> + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, >> + 400, >> + opts->x_param_values, >> + global_options_set.x_param_values); >> + >> + /* Set the L1 cache line size. */ >> + maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE, >> + 64, >> + opts->x_param_values, >> + global_options_set.x_param_values); >> + } >> + >> aarch64_override_options_after_change_1 (opts); >> }
I have a patch for the L1 cache size (prefetch) infastructure which sets it via tunning parameters but I have not had time to submit it yet. Also Peeling parameter changes helps ThunderX too. Thanks, Andrew >> > > >> @@ -13382,6 +13463,20 @@ aarch64_promoted_type (const_tree t) >> return float_type_node; >> return NULL_TREE; >> } >> + >> +/* Implement TARGET_CASE_VALUES_THRESHOLD. */ >> + >> +static unsigned int >> +aarch64_case_values_threshold (void) >> +{ >> + /* For Exynos M1, raise the bar for using jump tables. */ >> + if (selected_cpu->sched_core == exynosm1 >> + && optimize > 2) >> + return 48; > > Likewise, I think this should end up in the per-core tuning structures > rather than masked off by selected_cpu->sched_core == exynosm1. > >> + else >> + return default_case_values_threshold (); >> +} >> + >> #undef TARGET_ADDRESS_COST >> #define TARGET_ADDRESS_COST aarch64_address_cost >> > > Thanks, > James >