On 23 September 2016 at 19:49, Wilco Dijkstra <wilco.dijks...@arm.com> wrote:
> Richard Biener wrote:
>>On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf <mar...@trippelsdorf.de> 
>>wrote:
>> > And tramp3d only uses ten partitions (lto-min-partition=10000).
>> > With lto-min-partition=50000 (current patch) this decrease to only two
>> > partitions. As a result we loose the possible speedup on many core
>> > machines (-flto=n).
>
> Only if the size is close to the lto-min-partition. For larger applications 
> there is
> little difference.
>
>> > E.g. on my 4-core machine I get the following tramp3d compile times with
>> > -flto=4:
>> >
>> > lto-min-partition=50000: 20.146 total
>> > lto-min-partition=10000: 16.299 total
>> > lto-min-partition=1000 : 16.093 total
>> >
>> > So 50000 looks too big to me.
>
> That's only 16 seconds? Seems like it's small so ideally it should have
> used a single partition...
>
>> I think the issue is that the default number of partitions is too high
>> (32) which pessimizes 4-core machines if the units are too small.
>
> Yes, 8 might be a better value as 32 core machines are rare.
>
>> Maybe we can tune the triplet lto-partitions, lto-min-partition and
>> lto-max-partition in a way that it roughly scales the number of
>> partitions produced with program size rather than quickly raising
>> to 32 and then hovering there until the first unit hits lto-max-partition?
>
> Or use a single partition size rather than have the maximum size
> a hundred times the minimum size (which doesn't make sense at all).
>
>> > Also the "increased optimization opportunities" with fewer partitions
>> > were unmeasurable in the past. If I recall correctly Honza once said
>> > that there should be no difference between single vs. many partitions.
>>
>> Well, it definitely makes a difference for late IPA passes (that's mainly
>> IPA PTA).
>
> Also anchors don't work with multiple partitions. I get around 1% gain
> from using a single partition.
Hi Wilco,
I am working on LTO varpool partitioning to improve performance for
section anchors.
I posted a preliminary patch posted at:
https://gcc.gnu.org/ml/gcc/2016-07/msg00033.html
Unfortunately I haven't yet been able to benchmark it on ARM yet.
I am planning to restart working on it again soon.

Building with a single partition is not scalable. LTO build of
chromium with x86->arm
cross with a single partition results in "branch out of range"
assembler error. I added lto-max-partition
primarily to work around that limitation.

Thanks,
Prathamesh
>
> Wilco
>

Reply via email to