> > 150 and 200 make Silvermont performance better on 173.applu (+8%) and
> > 183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
> > Higher value of 300 leave the performance of mentioned tests
> > unchanged, but add some regressions on other benchmarks.
> > 
> > So I like 200 as well as 120 and 150, but can confirm performance
> > gains only for x86.
> 
> IMO it's either 150 or 200.  We chose 200 for our 4.9-based compiler because 
> this gave the performance boost without affecting the code size (on x86-64) 
> and because this was previously 400, but it's your call.

Both 150 or 200 globally work for me if there is not too much of code size
bloat (did not see code size mentioned here). 

What I did before decreasing the bounds was strenghtening the loop iteraton
count bounds and adding logic the predicts constant propagation enabled by
unrolling. For this reason 400 became too large as we did a lot more complete
unrolling than before. Also 400 in older compilers is not really 400 in newer.

Because I saw performance to drop only with values bellow 50, I went for 100.
It would be very interesting to actually analyze what happends for those two
benchmarks (that should not be too hard with perf).

Honza

Reply via email to