https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

--- Comment #15 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
With SRA improvements r:aae723d360ca26cd9fd0b039fb0a616bd0eae363 we finally get
good performance at -O2. Improvements to push_back implementation also helps a
bit.

Mainline with default flags (-O2):
    Input: JPEG - Quality: 90:
        19.76
        19.75
        19.68
Mainline with -O2 -march=native:
    Input: JPEG - Quality: 90:
        20.01
        20
        19.98
Mainline with -O2 -march=native -flto
    Input: JPEG - Quality: 90:
        19.95
        19.98
        19.81
Mainline with -O2 -march=native -flto --param max-inline-insns-auto=80 (this
makes push_back inlined)
    Input: JPEG - Quality: 90:
        19.98
        20.05
        20.03
Mainline with -O2 -flto  -march=native -I/usr/include/c++/v1 -nostdinc++ -lc++
(so clang's libc++)
        21.38
        21.37
        21.32
Mainline with -O2 -flto  -march=native run manualy since build machinery patch
is needed
        23.03
        22.85
        23.04
Clang 17 with -O2 -march=native -flto and also -fno-tree-vectorize
-fno-tree-slp-vectorize added by cmake. This is with system libstdc++ from
GCC13 so before push_back improvements.
        21.16
        20.95
        21.06
Clang 17 with -O2 -march=native -flto and also -fno-tree-vectorize
-fno-tree-slp-vectorize added by cmake. This is with trunk libstdc++ with
push_back improvements.
        21.2
        20.93
        20.98
Clang 17 with -O2 -march=native -flto -stdlib=libc++ and also
-fno-tree-vectorize -fno-tree-slp-vectorize added by cmake. This is with clan'g
libc++
    Input: JPEG - Quality: 90:
        22.08
        21.88
        21.78
Clang 17 with -O3 -march=native -flto
        23.08
        22.90
        22.84


libc++ declares push_back always_inline and splits out the slow copying path. I
think the inlined part is still bit too large for inlining at -O2.

We could still try to get remaining approx 10% without increasing code size at 
-O2
However major part of the problem is solved.

Reply via email to