More FDO related performance numbers
Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance
by 5% geomean
Experiment 2: our internal gcc compiler (4.4.3 based with many local
patches) O2 + FDO vs O2 (trunk gcc): FDO improves perf by 6.6%
geomean
Experiment 3: our internal gcc (4.4.3 with local patchs) O2 + LIPO vs
O2 (trunk gcc): LIPO improves by 12%
Experiment 4: trunk gcc O2 + LTO + fwhole-program + FDO vs O2: LTO +
FDO improves by 10.8%
1. Trunk gcc FDO vs O2 (5%)
164.gzip 1324 1302 -1.64%
175.vpr 1694 1725 1.84%
176.gcc 2293 2387 4.07%
181.mcf 1772 1756 -0.88%
186.crafty 2320 2280 -1.75%
197.parser 1166 1556 33.42%
252.eon 2443 2552 4.45%
253.perlbmk 2410 2586 7.28%
254.gap 1987 2021 1.71%
255.vortex 2392 2720 13.71%
256.bzip2 1719 1717 -0.12%
300.twolf 2288 2331 1.86%
2. 4.4.3 gcc with local patch FDO vs trunk O2 (6.6%)
164.gzip 1324 1317 -0.48%
175.vpr 1694 1758 3.76%
176.gcc 2293 2472 7.79%
181.mcf 1772 1730 -2.35%
186.crafty 2320 2353 1.40%
197.parser 1166 1652 41.70%
252.eon 2443 2610 6.82%
253.perlbmk 2410 2561 6.23%
254.gap 1987 1987 -0.04%
255.vortex 2392 2801 17.09%
256.bzip2 1719 1748 1.68%
300.twolf 2288 2335 2.04%
3. LIPO vs trunk O2 (12%)
164.gzip 1324 1350 1.99%
175.vpr 1694 1758 3.77%
176.gcc 2293 2519 9.83%
181.mcf 1772 1766 -0.33%
186.crafty 2320 2394 3.16%
197.parser 1166 1683 44.32%
252.eon 2443 2879 17.80%
253.perlbmk 2410 2556 6.04%
254.gap 1987 2139 7.61%
255.vortex 2392 3669 53.40%
256.bzip2 1719 1824 6.09%
300.twolf 2288 2345 2.49%
4. LTO + -fwhole-program + O2 + FDO vs O2 (10.8%)
164.gzip 1324 1340 1.25%
175.vpr 1694 1709 0.87%
176.gcc 2293 2411 5.13%
181.mcf 1772 1757 -0.80%
186.crafty 2320 2566 10.59%
197.parser 1166 1614 38.44%
252.eon 2443 2785 13.98%
253.perlbmk 2410 2618 8.61%
254.gap 1987 2063 3.81%
255.vortex 2392 3294 37.69%
256.bzip2 1719 1956 13.77%
300.twolf 2288 2404 5.07%
David
On Mon, Nov 15, 2010 at 6:18 PM, Xinliang David Li <[email protected]> wrote:
> More performance data:
>
> -O2 -funroll-all-loops vs O2: +1.1% geomean
>
> O2 O2 unroll-all-loops
> 164.gzip 1324 1336 0.94%
> 175.vpr 1694 1670 -1.44%
> 176.gcc 2293 2353 2.60%
> 181.mcf 1772 1793 1.20%
> 186.crafty 2320 2300 -0.86%
> 197.parser 1166 1171 0.39%
> 252.eon 2443 2515 2.93%
> 253.perlbmk 2410 2250 -6.66%
> 254.gap 1987 2041 2.68%
> 255.vortex 2392 2411 0.78%
> 256.bzip2 1719 1806 5.08%
> 300.twolf 2288 2436 6.44%
>
>
> -O3 -flto -fwhole-program vs -O2 : geomean +6% (-fwhole-program add ~1% )
>
> 164.gzip 1324 1318 -0.45%
> 175.vpr 1694 1717 1.34%
> 176.gcc 2293 2359 2.88%
> 181.mcf 1772 1772 0.02%
> 186.crafty 2320 2526 8.86%
> 197.parser 1166 1248 7.04%
> 252.eon 2443 2898 18.59%
> 253.perlbmk 2410 2323 -3.62%
> 254.gap 1987 2039 2.58%
> 255.vortex 2392 2918 21.99%
> 256.bzip2 1719 1946 13.19%
> 300.twolf 2288 2342 2.34%
>
>
> -O2 -flto -fwhole-program vs -O2: geomean +3.4% . mainly from three
> programs: vortex, eon and bzip2.
>
> 164.gzip 1324 1313 -0.82%
> 175.vpr 1694 1659 -2.05%
> 176.gcc 2293 2300 0.30%
> 181.mcf 1772 1781 0.52%
> 186.crafty 2320 2327 0.30%
> 197.parser 1166 1188 1.92%
> 252.eon 2443 2664 9.00%
> 253.perlbmk 2410 2470 2.47%
> 254.gap 1987 1987 -0.02%
> 255.vortex 2392 2883 20.53%
> 256.bzip2 1719 1839 7.00%
> 300.twolf 2288 2365 3.34%
>
>
> Thanks,
>
> David
>
>
> On Mon, Nov 15, 2010 at 5:50 PM, Jan Hubicka <[email protected]> wrote:
>>> On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka <[email protected]> wrote:
>>> >> > Fortunately linker plugin solves the problem here and this is why I
>>> >> > want to
>>> >> > have it by default. GCC then can do effectively -fwhole-program for
>>> >> > binaries
>>> >> > (since linker knows what will be bound elsewhere) and take advantage of
>>> >> > visibility((hidden)) hints for shared libraries same way. Most of
>>> >> > important
>>> >> > shared libraries gets visibility ((hidden)) right.
>>> >> >
>>> >> > It is sad that LTO w/o linker plugin doesn't give that much benefits.
>>> >> > Ideas are welcome here.
>>> >>
>>> >> Linker feedback will be limited here -- mostly global variable
>>> >> aliasing (as I remember only 2/3 spec programs benefit from it), it
>>> >> helps You don't get whole program points-to, whole program mod-ref
>>> >> (with context sensitivity), whole program structure layout. The latter
>>> >> are the real kickers (in terms of SPEC performance), but promoting LTO
>>> >> with those numbers can be misleading as many programs won't get it.
>>> >
>>> > Well, I am speaking of our linker plugin here. What it does is to pass
>>> > GCC
>>> > resolution information so it knows what symbols are bound externally.
>>> > Since
>>> > typically you link LTO alone or with small non-LTO part, most of symbols
>>> > are
>>> > not bound and thus effecitvely you get -fwhole-program (-fwhole-program
>>> > just
>>> > declare everything static except for main ())
>>> >
>>> > We don't really do whole program points-to or structure layout.
>>>
>>> gcc will eventually, right?
>>
>> Sure hope so ;)
>> We really need to solve scalability with our IPA points-to and make it
>> compatible with WHOPR.
>>>
>>> > Mod-ref is just
>>> > simple ipa-reference code. How you get context sensitivity on mod/ref?
>>>
>>> mod-ref relies on points-to. With context sensitive points-to, you can
>>> also get CS mod-ref -- basically mod-ref info per callsite.
>>
>> Ah sure, I was too focused on our current "mod/ref" :)
>>
>> Honza
>>
>