On Fri, Sep 23, 2016 at 3:29 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf
> <mar...@trippelsdorf.de> wrote:
>> On 2016.09.22 at 15:42 +0200, Markus Trippelsdorf wrote:
>>> On 2016.09.22 at 15:36 +0200, Richard Biener wrote:
>>> > On Thu, Sep 22, 2016 at 3:13 PM, Wilco Dijkstra <wilco.dijks...@arm.com> 
>>> > wrote:
>>> > > Increase the lto-min-partition size to 50000 to reduce the number of 
>>> > > partitions.
>>> > > See eg. https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00235.html for a 
>>> > > concise
>>> > > explanation why 10000 is too small for modern CPU/memory size.  
>>> > > Additionally,
>>> > > larger values increase optimization opportunities and reduce bad 
>>> > > decisions in the
>>> > > layout of global variables across partitions (anchors do not work well 
>>> > > with LTO).
>>> > > Looking at SPEC2000, 8 more benchmarks now use a single LTO partition 
>>> > > which
>>> > > is the most optimal.  Build time with LTO increases only slightly, eg. 
>>> > > SPEC2006
>>> > > now takes 2% more time on an 8-core ARM server.
>>> >
>>> > Ok.  Marcus, how many partitions do we get with libreoffice/firefox 
>>> > currently
>>> > (I suppose they all hit lto-max-partition now?)
>>>
>>> Yes. Even tramp3d currently gets 30 partitions. With this patch it gets
>>> reduced to 20.
>>> And I guess bigger projects like Firefox are unchanged at 32.
>>
>> Sorry I've reported wrong numbers above.
>>
>> lto-min-partition was already increased from 1000 to 10000 on trunk by
>> Prathamesh in April.
>
> Ah, I forgot about this.  10000 is equal to large-unit-insns btw and about
> four times of large-function-insns.
>
>> And tramp3d only uses ten partitions (lto-min-partition=10000).
>> With lto-min-partition=50000 (current patch) this decrease to only two
>> partitions. As a result we loose the possible speedup on many core
>> machines (-flto=n).
>>
>> E.g. on my 4-core machine I get the following tramp3d compile times with
>> -flto=4:
>>
>> lto-min-partition=50000: 20.146 total
>> lto-min-partition=10000: 16.299 total
>> lto-min-partition=1000 : 16.093 total
>>
>> So 50000 looks too big to me.
>
> I think the issue is that the default number of partitions is too high
> (32) which pessimizes 4-core machines if the units are too small.
>
> Maybe we can tune the triplet lto-partitions, lto-min-partition and
> lto-max-partition in a way that it roughly scales the number of
> partitions produced with program size rather than quickly raising
> to 32 and then hovering there until the first unit hits lto-max-partition?

Which would imply lto-max-partition being on the order of
lto-partitions * lto-min-partition
or simply only having a single lto-partition-size param.

I suppose making all this runtime dependent on # cores isn't something we can do
as this will lead to code-generation changes.

Richard.

>
>> Also the "increased optimization opportunities" with fewer partitions
>> were unmeasurable in the past. If I recall correctly Honza once said
>> that there should be no difference between single vs. many partitions.
>
> Well, it definitely makes a difference for late IPA passes (that's mainly
> IPA PTA).
>
> Richard.
>
>> --
>> Markus

Reply via email to