PING. "200" currently looks optimal for x86. Let's commit the following:
2014-11-21 Evgeny Stupachenko <evstu...@gmail.com> * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p, opts->x_param_values, opts_set->x_param_values); + /* Extend full peel max insns parameter for x86. */ + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, + 200, + opts->x_param_values, + opts_set->x_param_values); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch On Wed, Nov 12, 2014 at 5:02 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote: > Code size for spec2000 is almost unchanged (many benchmarks have the > same binaries). > For those that are changed we have the following numbers (200 vs 100, > both dynamic build -Ofast -funroll-loops -flto): > 183.equake +10% > 164.gzip, 173.applu +3,5% > 187.facerec, 191.fma3d +2,5% > 200.sixstrack +2% > 177.mesa, 178.galgel +1% > > > On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> > 150 and 200 make Silvermont performance better on 173.applu (+8%) and >>> > 183.equake (+3%); Haswell spec2006 performance stays almost unchanged. >>> > Higher value of 300 leave the performance of mentioned tests >>> > unchanged, but add some regressions on other benchmarks. >>> > >>> > So I like 200 as well as 120 and 150, but can confirm performance >>> > gains only for x86. >>> >>> IMO it's either 150 or 200. We chose 200 for our 4.9-based compiler because >>> this gave the performance boost without affecting the code size (on x86-64) >>> and because this was previously 400, but it's your call. >> >> Both 150 or 200 globally work for me if there is not too much of code size >> bloat (did not see code size mentioned here). >> >> What I did before decreasing the bounds was strenghtening the loop iteraton >> count bounds and adding logic the predicts constant propagation enabled by >> unrolling. For this reason 400 became too large as we did a lot more complete >> unrolling than before. Also 400 in older compilers is not really 400 in >> newer. >> >> Because I saw performance to drop only with values bellow 50, I went for 100. >> It would be very interesting to actually analyze what happends for those two >> benchmarks (that should not be too hard with perf). >> >> Honza