On Mon, May 16, 2016 at 11:38:04AM +0100, Wilco Dijkstra wrote:
> GCC expands switch statements in a very simplistic way and tries to use a 
> table
> expansion even when it is a bad idea for performance or codesize.
> GCC typically emits extremely sparse tables that contain mostly default 
> entries
> (something which currently cannot be tuned by backends).  Additionally the
> computation of the minimum/maximum label offsets is too simplistic so the 
> tables
> are often twice as large as necessary.
> 
> The cost of a table switch is significant due to the setup overhead, the table
> lookup (which due to being sparse and large adds unnecessary cachemisses)
> and hard to predict indirect jump.  Therefore it is best to avoid using a 
> table
> unless there are many real case labels.
> 
> This patch fixes that by setting the default aarch64_case_values_threshold to
> 16 when the per-CPU tuning is not set.  On SPEC2006 this improves the switch
> heavy benchmarks GCC and perlbench both in performance (1-2%) as well as size
> (0.5-1% smaller).
> 
> OK for trunk?

I have a trivial request to change the comment on the function. Otherwise,
this is now OK for trunk.

> ChangeLog:
> 2016-04-22  Wilco Dijkstra  <wdijk...@arm.com>
> 
>     gcc/
>         * config/aarch64/aarch64.c (aarch64_case_values_threshold):
>         Return a better case_values_threshold when optimizing.
> 
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0620f1e..a240635 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3546,7 +3546,12 @@ aarch64_cannot_force_const_mem (machine_mode mode 
> ATTRIBUTE_UNUSED, rtx x)
>    return aarch64_tls_referenced_p (x);
>  }
> 
> -/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
> +/* Implement TARGET_CASE_VALUES_THRESHOLD.
> +   The expansion for a table switch is quite expensive due to the number
> +   of instructions, the table lookup and hard to predict indirect jump.
> +   When optimizing for speed, with -O3 use the per-core tuning if set,
> +   otherwise use tables for > 16 cases as a tradeoff between size and
> +   performance.  */

This comment doesn't cover the "optimize_size" case.

Thanks,
James

Reply via email to