https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125607

            Bug ID: 125607
           Summary: Overaggressive cloning for insufficient perf vs code
                    size benefit
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

Created attachment 64628
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64628&action=edit
Testcase demonstrating excessive cloning

The cppcheck benchmark in SPEC2026 compiled with GCC at -Ofast -flto is slower
than with LLVM by ~13%. Part of that seems to be GCC aggressively bloating code
through clones that don't offer much benefit. I've tried to craft a
representative testcase, attached here with comments.

freq_sum from caller_statistics is the summed frequency of all call sites that
would be redirected to the clone.
Because it scales with how many / how hot the callers are, a function called
from many hot sites clears the threshold with an arbitrarily small
per-call benefit, IPA-CP duplicates a large body even though knowing the
propagated constant barely simplifies it. evaluation in
good_cloning_opportunity_p doesn't model the fact that when the "specialised"
parameter is only forwarded as a runtime argument into a callee that is itself
not inlined/cloned, the modelled saving never materialises.

Adding a cutoff to good_cloning_opportunity_p to allow the clone only if the
calculated benefit is > N * size_cost where N is something like ~3% seems to
fix this and gives up to 10% improvement on cppcheck through avoiding a large
amount of non-beneficial clones.

Reply via email to