https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Pat Haugen from comment #3) > (In reply to Jan Hubicka from comment #1) > > Pat, can you try to figure out what value of min-speedup is neeed to recover > > from this regression? > > Using r257582, either of the following options restores the behavior of not > inlining the mainGtU call and eliminates the performance regression. > > --param inline-min-speedup=18 > > --param max-inline-insns-auto=24 Note that both are odd - they are changes in exactly the same direction as the revision that caused the regression which did 8->15 for inline-min-speedup and 40->30 for max-inline-insns-auto. That means inlining even less will restore performance? So somehow the issue is inlining the whole of mainGtU but only after inlining of the split part? And that only happens when other inlining is changed. Do you observe the same slowdown if you restore either of the params to the value before the r257582 change? Also it looks like we are inlining mainGtU.part back into mainGtU and inlining the result (the inline clone!) into mainSimpleSort. That's against the intent of function splitting I think which would always first inline the header and then eventually the tail? That is: IPA function summary for mainGtU.part.0/48 inlinable global time: 240.000000 self size: 243 IPA function summary for mainGtU/37 inlinable global time: 29.125000 self size: 36 Inlined mainGtU.part.0 into mainGtU which now has time 56.750000 and size 267, net change of -12. Considering mainGtU/37 with 267 size <=========== to be inlined into mainSimpleSort/39 in blocksort.c:561 Estimated badness is -0.000001, frequency 7718.74. Badness calculation for mainSimpleSort/39 -> mainGtU/37 size growth 256, time 54.750000 unspec 56.750000 big_speedup -0.000001: guessed profile. frequency 7718.740543, count -1 caller count -1 time w/o inlining 827687.848633, time with inlining 681031.777832 overall growth 501 (current) 39 (original) 1521 (compensated) Adjusted by hints -0.000001 Accounting size:215.00, time:307784.78 on predicate exec:(true) Accounting size:12.00, time:31839.80 on predicate exec:(true) Accounting size:12.00, time:31839.80 on predicate exec:(true) Accounting size:12.00, time:25085.91 on predicate exec:(true) Accounting size:12.00, time:25085.91 on predicate exec:(true) Accounting size:1.00, time:964.84 on predicate exec:(true) Processing frequency mainGtU Called by mainSimpleSort that is normal or hot Processing frequency mainGtU.part.0 Called by mainGtU that is normal or hot not inlinable: mainSimpleSort/39 -> mainGtU/37, --param max-inline-insns-auto limit reached not inlinable: mainSimpleSort/39 -> mainGtU/37, --param max-inline-insns-auto limit reached Inlined mainGtU into mainSimpleSort which now has time 10651.946587 and size 113, net change of +256. I'm not sure why the estimates work out the way they do but maybe sth bogus happens with the predicates when inlining back the split part into the header. Btw, we didn't tune max-inline-insns-single for a long time which is way larger than the -auto limit (400!) now. Dumping that down to 250 would solve the regression as well I guess. OTOH in the past I argued this limit should not exist. But what needs to be investigated is why we assume such big speedup here for one call but not the others (827687.848633 to 681031.777832). In the prev. revision the mainGtU header provided similar speedup, 756418.501953 to 617717.608398. The interesting thing is that the inlined back tail has much lower time than the tail itself which looks inconsistent. Honza?