On 23 September 2016 at 19:49, Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > Richard Biener wrote: >>On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf <mar...@trippelsdorf.de> >>wrote: >> > And tramp3d only uses ten partitions (lto-min-partition=10000). >> > With lto-min-partition=50000 (current patch) this decrease to only two >> > partitions. As a result we loose the possible speedup on many core >> > machines (-flto=n). > > Only if the size is close to the lto-min-partition. For larger applications > there is > little difference. > >> > E.g. on my 4-core machine I get the following tramp3d compile times with >> > -flto=4: >> > >> > lto-min-partition=50000: 20.146 total >> > lto-min-partition=10000: 16.299 total >> > lto-min-partition=1000 : 16.093 total >> > >> > So 50000 looks too big to me. > > That's only 16 seconds? Seems like it's small so ideally it should have > used a single partition... > >> I think the issue is that the default number of partitions is too high >> (32) which pessimizes 4-core machines if the units are too small. > > Yes, 8 might be a better value as 32 core machines are rare. > >> Maybe we can tune the triplet lto-partitions, lto-min-partition and >> lto-max-partition in a way that it roughly scales the number of >> partitions produced with program size rather than quickly raising >> to 32 and then hovering there until the first unit hits lto-max-partition? > > Or use a single partition size rather than have the maximum size > a hundred times the minimum size (which doesn't make sense at all). > >> > Also the "increased optimization opportunities" with fewer partitions >> > were unmeasurable in the past. If I recall correctly Honza once said >> > that there should be no difference between single vs. many partitions. >> >> Well, it definitely makes a difference for late IPA passes (that's mainly >> IPA PTA). > > Also anchors don't work with multiple partitions. I get around 1% gain > from using a single partition. Hi Wilco, I am working on LTO varpool partitioning to improve performance for section anchors. I posted a preliminary patch posted at: https://gcc.gnu.org/ml/gcc/2016-07/msg00033.html Unfortunately I haven't yet been able to benchmark it on ARM yet. I am planning to restart working on it again soon.
Building with a single partition is not scalable. LTO build of chromium with x86->arm cross with a single partition results in "branch out of range" assembler error. I added lto-max-partition primarily to work around that limitation. Thanks, Prathamesh > > Wilco >