On Mon, May 16, 2016 at 11:38:04AM +0100, Wilco Dijkstra wrote: > GCC expands switch statements in a very simplistic way and tries to use a > table > expansion even when it is a bad idea for performance or codesize. > GCC typically emits extremely sparse tables that contain mostly default > entries > (something which currently cannot be tuned by backends). Additionally the > computation of the minimum/maximum label offsets is too simplistic so the > tables > are often twice as large as necessary. > > The cost of a table switch is significant due to the setup overhead, the table > lookup (which due to being sparse and large adds unnecessary cachemisses) > and hard to predict indirect jump. Therefore it is best to avoid using a > table > unless there are many real case labels. > > This patch fixes that by setting the default aarch64_case_values_threshold to > 16 when the per-CPU tuning is not set. On SPEC2006 this improves the switch > heavy benchmarks GCC and perlbench both in performance (1-2%) as well as size > (0.5-1% smaller). > > OK for trunk?
I have a trivial request to change the comment on the function. Otherwise, this is now OK for trunk. > ChangeLog: > 2016-04-22 Wilco Dijkstra <wdijk...@arm.com> > > gcc/ > * config/aarch64/aarch64.c (aarch64_case_values_threshold): > Return a better case_values_threshold when optimizing. > > -- > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 0620f1e..a240635 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -3546,7 +3546,12 @@ aarch64_cannot_force_const_mem (machine_mode mode > ATTRIBUTE_UNUSED, rtx x) > return aarch64_tls_referenced_p (x); > } > > -/* Implement TARGET_CASE_VALUES_THRESHOLD. */ > +/* Implement TARGET_CASE_VALUES_THRESHOLD. > + The expansion for a table switch is quite expensive due to the number > + of instructions, the table lookup and hard to predict indirect jump. > + When optimizing for speed, with -O3 use the per-core tuning if set, > + otherwise use tables for > 16 cases as a tradeoff between size and > + performance. */ This comment doesn't cover the "optimize_size" case. Thanks, James