On Tue, Oct 27, 2015 at 06:12:48PM -0500, Evandro Menezes wrote:
> This patch adds the scheduling and cost models for Exynos M1.
>
> Though it?s a rather large patch, much of it is the DFA model for the
> pipeline.? Still, I?d appreciate any feedback.
>
> Please, commit if it?s alright.
Hi Evandro,
Thanks for the patch, I have some comments.
To ease review, could I ask you to turn this in to a patch series? Roughly
structured as so:
1/4: Add the Exynos-M1 cost models.
2/4: Add the Exynos M1 scheduling model.
3/4: Add the infrastructure for TARGET_CASE_VALUES_THRESHOLD.
4/4: Add the extra tuning heuristics.
Your support is missing a critical hunk for AArch64, there should be an
(include "../arm/exynos-m1.md")
in aarch64.md to get this working.
This is a fairly large pipeline description (add (automata_option "stats")
to the .md file):
Automaton `exynos_m1'
62320 NDFA states, 489094 NDFA arcs
>From experience, you get little benefit from such a complex model, but you
do slow bootstrap times. It isn't for me to say where the model can be
trimmed (I don't have access to documentation for the Exynos-M1), but
you may find it useful to split out the SIMD/FP automaton, and look at whether
your modelling of long latency instructions is entirely neccesary. Have a
look at the Cortex-A57 and Cortex-A53 for some examples of what I mean.
For comparison, here are the stats for Cortex-A53 and Cortex-A57:
Automaton `cortex_a53'
281 NDFA states, 1158 NDFA arcs
Automaton `cortex_a53_advsimd'
9072 NDFA states, 49572 NDFA arcs
Automaton `cortex_a57'
764 NDFA states, 3600 NDFA arcs
Automaton `cortex_a57_cx'
204 NDFA states, 864 NDFA arcs
> @@ -7672,6 +7737,22 @@ aarch64_override_options_internal (struct gcc_options
> *opts)
> opts->x_param_values,
> global_options_set.x_param_values);
>
> + /* Adjust the heuristics for Exynos M1. */
> + if (selected_cpu->sched_core == exynosm1)
I think it would be preferable to pull these tuning parameters in to
the target structures somehow, rather than guarding them off by specific
CPUs.
> + {
> + /* Increase the maximum peeling limit. */
> + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
> + 400,
> + opts->x_param_values,
> + global_options_set.x_param_values);
> +
> + /* Set the L1 cache line size. */
> + maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE,
> + 64,
> + opts->x_param_values,
> + global_options_set.x_param_values);
> + }
> +
> aarch64_override_options_after_change_1 (opts);
> }
>
> @@ -13382,6 +13463,20 @@ aarch64_promoted_type (const_tree t)
> return float_type_node;
> return NULL_TREE;
> }
> +
> +/* Implement TARGET_CASE_VALUES_THRESHOLD. */
> +
> +static unsigned int
> +aarch64_case_values_threshold (void)
> +{
> + /* For Exynos M1, raise the bar for using jump tables. */
> + if (selected_cpu->sched_core == exynosm1
> + && optimize > 2)
> + return 48;
Likewise, I think this should end up in the per-core tuning structures
rather than masked off by selected_cpu->sched_core == exynosm1.
> + else
> + return default_case_values_threshold ();
> +}
> +
> #undef TARGET_ADDRESS_COST
> #define TARGET_ADDRESS_COST aarch64_address_cost
>
Thanks,
James