https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115144
Bug ID: 115144
Summary: [15 Regression] 2% performance regression for some
codes with r15-518-g99b1daae18c095
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hp at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
Target Milestone: ---
Target: cris-elf
...and also, regresses gcc.target/cris/pr93372-47.c. The actual
purpose of that test-case is as a regression-test for a fixed bug with
delay-slot-filling, but it also serves as a guard against code quality
regression. Following up as per the comment in pr93372-47.c about
what to investigate in case it regressed, I see a quite large
regression:
The commit r15-518-g99b1daae18c095 "tree-optimization/114589 - remove
profile based sink heuristics" caused an almost 2% performance
regression for certain codes, as measured by simulator output by executing
gcc.c-torture/execute/arith-rand-ll.c compiled for cris-elf with -O2
-march=v10.
r15-0517:
Basic clock cycles, total @: 13025734
r15-0518:
Basic clock cycles, total @: 13279004
Also,
I inspected simulator output and the bulk is indeed in random_bitstring
(i.e. not in div and mod library functions).
Perhaps you say that ivopts matters here?
The same, adding -fno-ivopts,
r15-0517:
Basic clock cycles, total @: 13008338
r15-0518:
Basic clock cycles, total @: 13330520
...so the regression is then even larger; almost 2.5%.
It may be argued that arith-rand-ll.c is not a reliable performance
test, so I also ran r15-0517 and r15-0518 by coremark, which paints
a different picture:
r15-0517:
Basic clock cycles, total @: 5022704
r15-0518:
Basic clock cycles, total @: 5021785
So there, it's a win in performance, if only small (~0.02%).
Same, with -fno-ivopts:
r15-0517:
Basic clock cycles, total @: 5641650
r15-0518:
Basic clock cycles, total @: 5640721
Still a win in performance, only smaller (still ~0.02%).
Judging from coremark, there's no general conclusion regarding
performance of r15-518, but I know from other performance
investigations that "double register"-heavy code such as
arith-rand-ll.c for CRIS has different characteristics than other
test-code, here coremark.
Maybe something can be done to improve on r15-518 for this type of code
or maybe it exposed problems for other ports, so I'm not going to
immediately myself close this as WONTFIX. I'll also be using this PR
as an anchor when dealing with (likely xfailing) the regression for
gcc.target/cris/pr93372-47.c.