Hello,
I have also re-done most of my firefox testing similar to ones I published at
http://hubicka.blogspot.cz/2014/04/linktime-optimization-in-gcc-2-firefox.html
(thanks to Martin Liska who got LTO builds to work again)
I am attaching statistics on binary sizes. Interesting is that for firefox LTO
is quite
good size optimization (16% on text) and similarly FDO reduces text size and
combines well
with LTO, which is bit different from Martin's gcc stats. I have looked into
this very
briefly and one isse seems to be with the way we determine hot/cold threshold.
binary size text relocations data EH
rest
gcc6 -O3 90448658 12887358 13720073
13035704 257839
gcc6 -O3 -flto 75810786 12145211 12390185 8422776
240002
gcc6 -O3 + FDO 67087824 13008294 13655305
13719944 259585
gcc6 -O3 -flto + FDO 60206898 12169803 12334113 9083088
240050
gcc7 -O3 93233440 12928831 13780313
13578224 257408
gcc7 -O3 -flto 76764274 12128031 12405369 8420448
240662
gcc7 -O3 + FDO 67500688 12994279 13650185
13661760 263400
gcc7 -O3 -flto + FDO 59776994 12151360 12325217 8971344
239501
gcc8 -O2 80311120 12939568 13763033
12948752 258711
gcc8 -O2 -flto 69156752 12109236 12475801 8501152
240163
gcc8 -O3 89913648 12924468 13790393
13374328 256867
gcc8 -O3 -flto 75971122 12138528 12426649 8593024
239861
gcc8 -O3 + FDO 67047632 12996890 13707017
13146232 263413
gcc8 -O3 -flto + FDO 58951410 12146008 12377161 8634152
241765
I also did some builds with clang. Observation is that clang's -O3 binary is
smaller than ours, while our LTO/FDO builds are smaller than clang's (LTO+FDO
build quite substantially).
Our EH is bigger than clang's which is probably something to look into. One
problem I am
aware of is that our nothrow pass is not type sensitive and thus won't figure
out if
program throws an exception of specific type and catches it later.
clang6 -O3 84754848 13032018 13597433
10791528 371429
clang6 -O3 -flto 90757024 12273574 12258521 6841424
350585
clang6 -O3 -flto=thin 92940576 12376724 12479233 7974856
353171
clang6 -O3 + FDO 81776880 13136428 13574489
11501344 385123
clang6 -O3 -flto=thin+FDO 88374432 12405075 12434297 9574416
356508
clang6 -O3 -flto + FDO 90637168 12288433 12244265 9023304
349078
I also did some benchmarking and found at least an issue with -flto -O3 hitting
--param inline-unit-growth bit too early so we do not get much benefits (while
clang does but it also does not reduce binary size). -O3 -flto + FDO or -O2
-flto seems to work well. Will summarize the results later.
Firefox developer Tom Ritter has tested LTO with FDO and without here (it is
rather nice interface - I like that one can click to the graph and see the
results in context of other tests done recently). This is done with gcc6.
Tracking bug:
https://bugzilla.mozilla.org/show_bug.cgi?format=default&id=521435
non-FDO build:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=12ce14a5bcac9975b41a1f901bfc3a8dcb2d791b&framework=1&showOnlyImportant=1&selectedTimeRange=172800
FDO build:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=7e5bd52e36fcc1703ced01fe87e831a716677295&framework=1&showOnlyImportant=1&selectedTimeRange=172800
Honza