On Fri, Sep 23, 2016 at 3:29 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf > <mar...@trippelsdorf.de> wrote: >> On 2016.09.22 at 15:42 +0200, Markus Trippelsdorf wrote: >>> On 2016.09.22 at 15:36 +0200, Richard Biener wrote: >>> > On Thu, Sep 22, 2016 at 3:13 PM, Wilco Dijkstra <wilco.dijks...@arm.com> >>> > wrote: >>> > > Increase the lto-min-partition size to 50000 to reduce the number of >>> > > partitions. >>> > > See eg. https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00235.html for a >>> > > concise >>> > > explanation why 10000 is too small for modern CPU/memory size. >>> > > Additionally, >>> > > larger values increase optimization opportunities and reduce bad >>> > > decisions in the >>> > > layout of global variables across partitions (anchors do not work well >>> > > with LTO). >>> > > Looking at SPEC2000, 8 more benchmarks now use a single LTO partition >>> > > which >>> > > is the most optimal. Build time with LTO increases only slightly, eg. >>> > > SPEC2006 >>> > > now takes 2% more time on an 8-core ARM server. >>> > >>> > Ok. Marcus, how many partitions do we get with libreoffice/firefox >>> > currently >>> > (I suppose they all hit lto-max-partition now?) >>> >>> Yes. Even tramp3d currently gets 30 partitions. With this patch it gets >>> reduced to 20. >>> And I guess bigger projects like Firefox are unchanged at 32. >> >> Sorry I've reported wrong numbers above. >> >> lto-min-partition was already increased from 1000 to 10000 on trunk by >> Prathamesh in April. > > Ah, I forgot about this. 10000 is equal to large-unit-insns btw and about > four times of large-function-insns. > >> And tramp3d only uses ten partitions (lto-min-partition=10000). >> With lto-min-partition=50000 (current patch) this decrease to only two >> partitions. As a result we loose the possible speedup on many core >> machines (-flto=n). >> >> E.g. on my 4-core machine I get the following tramp3d compile times with >> -flto=4: >> >> lto-min-partition=50000: 20.146 total >> lto-min-partition=10000: 16.299 total >> lto-min-partition=1000 : 16.093 total >> >> So 50000 looks too big to me. > > I think the issue is that the default number of partitions is too high > (32) which pessimizes 4-core machines if the units are too small. > > Maybe we can tune the triplet lto-partitions, lto-min-partition and > lto-max-partition in a way that it roughly scales the number of > partitions produced with program size rather than quickly raising > to 32 and then hovering there until the first unit hits lto-max-partition?
Which would imply lto-max-partition being on the order of lto-partitions * lto-min-partition or simply only having a single lto-partition-size param. I suppose making all this runtime dependent on # cores isn't something we can do as this will lead to code-generation changes. Richard. > >> Also the "increased optimization opportunities" with fewer partitions >> were unmeasurable in the past. If I recall correctly Honza once said >> that there should be no difference between single vs. many partitions. > > Well, it definitely makes a difference for late IPA passes (that's mainly > IPA PTA). > > Richard. > >> -- >> Markus