https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125607
Bug ID: 125607
Summary: Overaggressive cloning for insufficient perf vs code
size benefit
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: ipa
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
Target Milestone: ---
Created attachment 64628
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64628&action=edit
Testcase demonstrating excessive cloning
The cppcheck benchmark in SPEC2026 compiled with GCC at -Ofast -flto is slower
than with LLVM by ~13%. Part of that seems to be GCC aggressively bloating code
through clones that don't offer much benefit. I've tried to craft a
representative testcase, attached here with comments.
freq_sum from caller_statistics is the summed frequency of all call sites that
would be redirected to the clone.
Because it scales with how many / how hot the callers are, a function called
from many hot sites clears the threshold with an arbitrarily small
per-call benefit, IPA-CP duplicates a large body even though knowing the
propagated constant barely simplifies it. evaluation in
good_cloning_opportunity_p doesn't model the fact that when the "specialised"
parameter is only forwarded as a runtime argument into a callee that is itself
not inlined/cloned, the modelled saving never materialises.
Adding a cutoff to good_cloning_opportunity_p to allow the clone only if the
calculated benefit is > N * size_cost where N is something like ~3% seems to
fix this and gives up to 10% improvement on cppcheck through avoiding a large
amount of non-beneficial clones.