[Bug tree-optimization/115144] New: [15 Regression] 2% performance regression for some codes with r15-518-g99b1daae18c095

hp at gcc dot gnu.org via Gcc-bugs Fri, 17 May 2024 20:07:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115144


            Bug ID: 115144
           Summary: [15 Regression] 2% performance regression for some
                    codes with r15-518-g99b1daae18c095
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hp at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: cris-elf

...and also, regresses gcc.target/cris/pr93372-47.c.  The actual
purpose of that test-case is as a regression-test for a fixed bug with
delay-slot-filling, but it also serves as a guard against code quality
regression.  Following up as per the comment in pr93372-47.c about
what to investigate in case it regressed, I see a quite large
regression:

The commit r15-518-g99b1daae18c095 "tree-optimization/114589 - remove
profile based sink heuristics" caused an almost 2% performance
regression for certain codes, as measured by simulator output by executing
gcc.c-torture/execute/arith-rand-ll.c compiled for cris-elf with -O2
-march=v10.

r15-0517:
Basic clock cycles, total @: 13025734

r15-0518:
Basic clock cycles, total @: 13279004

Also,

I inspected simulator output and the bulk is indeed in random_bitstring
(i.e. not in div and mod library functions).

Perhaps you say that ivopts matters here?

The same, adding -fno-ivopts,

r15-0517:
Basic clock cycles, total @: 13008338

r15-0518:
Basic clock cycles, total @: 13330520

...so the regression is then even larger; almost 2.5%.

It may be argued that arith-rand-ll.c is not a reliable performance
test, so I also ran r15-0517 and r15-0518 by coremark, which paints
a different picture:

r15-0517:
Basic clock cycles, total @: 5022704

r15-0518:
Basic clock cycles, total @: 5021785

So there, it's a win in performance, if only small (~0.02%).
Same, with -fno-ivopts:

r15-0517:
Basic clock cycles, total @: 5641650

r15-0518:
Basic clock cycles, total @: 5640721
Still a win in performance, only smaller (still ~0.02%).

Judging from coremark, there's no general conclusion regarding
performance of r15-518, but I know from other performance
investigations that "double register"-heavy code such as
arith-rand-ll.c for CRIS has different characteristics than other
test-code, here coremark.

Maybe something can be done to improve on r15-518 for this type of code
or maybe it exposed problems for other ports, so I'm not going to
immediately myself close this as WONTFIX.  I'll also be using this PR
as an anchor when dealing with (likely xfailing) the regression for
gcc.target/cris/pr93372-47.c.

[Bug tree-optimization/115144] New: [15 Regression] 2% performance regression for some codes with r15-518-g99b1daae18c095

Reply via email to