9 Regression] Performance regressions on SPEC with r257582

rguenth at gcc dot gnu.org Thu, 15 Nov 2018 05:35:58 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103


--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Pat Haugen from comment #3)
> (In reply to Jan Hubicka from comment #1)
> > Pat, can you try to figure out what value of min-speedup is neeed to recover
> > from this regression?
> 
> Using r257582, either of the following options restores the behavior of not
> inlining the mainGtU call and eliminates the performance regression.
> 
> --param inline-min-speedup=18
> 
> --param max-inline-insns-auto=24

Note that both are odd - they are changes in exactly the same direction as
the revision that caused the regression which did 8->15 for inline-min-speedup
and 40->30 for max-inline-insns-auto.

That means inlining even less will restore performance?

So somehow the issue is inlining the whole of mainGtU but only after
inlining of the split part?  And that only happens when other inlining
is changed.

Do you observe the same slowdown if you restore either of the params to
the value before the r257582 change?

Also it looks like we are inlining mainGtU.part back into mainGtU and
inlining the result (the inline clone!) into mainSimpleSort.  That's
against the intent of function splitting I think which would always
first inline the header and then eventually the tail?

That is:

IPA function summary for mainGtU.part.0/48 inlinable
  global time:     240.000000
  self size:       243

IPA function summary for mainGtU/37 inlinable
  global time:     29.125000
  self size:       36

 Inlined mainGtU.part.0 into mainGtU which now has time 56.750000 and size 267,
net change of -12.

Considering mainGtU/37 with 267 size            <===========
 to be inlined into mainSimpleSort/39 in blocksort.c:561
 Estimated badness is -0.000001, frequency 7718.74.
    Badness calculation for mainSimpleSort/39 -> mainGtU/37
      size growth 256, time 54.750000 unspec 56.750000  big_speedup
      -0.000001: guessed profile. frequency 7718.740543, count -1 caller count
-1 time w/o inlining 827687.848633, time with inlining 681031.777832 overall
growth 501 (current) 39 (original) 1521 (compensated)
      Adjusted by hints -0.000001
                Accounting size:215.00, time:307784.78 on predicate exec:(true)
                Accounting size:12.00, time:31839.80 on predicate exec:(true)
                Accounting size:12.00, time:31839.80 on predicate exec:(true)
                Accounting size:12.00, time:25085.91 on predicate exec:(true)
                Accounting size:12.00, time:25085.91 on predicate exec:(true)
                Accounting size:1.00, time:964.84 on predicate exec:(true)
Processing frequency mainGtU
  Called by mainSimpleSort that is normal or hot
Processing frequency mainGtU.part.0
  Called by mainGtU that is normal or hot
  not inlinable: mainSimpleSort/39 -> mainGtU/37, --param max-inline-insns-auto
limit reached
  not inlinable: mainSimpleSort/39 -> mainGtU/37, --param max-inline-insns-auto
limit reached
 Inlined mainGtU into mainSimpleSort which now has time 10651.946587 and size
113, net change of +256.

I'm not sure why the estimates work out the way they do but maybe sth
bogus happens with the predicates when inlining back the split part into
the header.

Btw, we didn't tune max-inline-insns-single for a long time which is
way larger than the -auto limit (400!) now.  Dumping that down to 250
would solve the regression as well I guess.  OTOH in the past I argued
this limit should not exist.

But what needs to be investigated is why we assume such big speedup
here for one call but not the others (827687.848633 to 681031.777832).

In the prev. revision the mainGtU header provided similar speedup,
756418.501953 to 617717.608398.

The interesting thing is that the inlined back tail has much lower time
than the tail itself which looks inconsistent.  Honza?

[Bug ipa/85103] [8/9 Regression] Performance regressions on SPEC with r257582

Reply via email to