tejohn...@google.com (Teresa Johnson) writes:

> This patch adds heuristics to limit unrolling in loops with branches that may 
> increase
> branch mispredictions. It affects loops that are not frequently iterated, and 
> that are
> nested within a hot region of code that already contains many branch 
> instructions.
>
> Performance tested with both internal benchmarks and with SPEC 2000/2006 on a 
> variety
> of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD 
> Opteron systems.
> This improves performance of an internal search indexing benchmark by close 
> to 2% on
> all the tested Intel platforms.  It also consistently improves 445.gobmk 
> (with FDO feedback
> where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
> effects are
> neutral.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?

One problem with any unrolling heuristics is currently that gcc has both
the tree level and the rtl level unroller. The tree one is even on at
-O3.  So if you tweak anything for one you have to affect both, otherwise the
other may still do the wrong thing(tm).

For some other tweaks I looked into a shared cost model some time ago.
May be still needed.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only

Reply via email to