https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122393

            Bug ID: 122393
           Summary: GCC -O3 fails to optimize loop over constexpr array of
                    enum implementing contains() to a bitmask AND
           Product: gcc
           Version: 15.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: marc.mutz at hotmail dot com
  Target Milestone: ---

Created attachment 62620
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62620&action=edit
GCC proprocessed source

In Qt, we have code like this:

                constexpr QUnicodeTables::LineBreakClass lb15b[] = {
                        QUnicodeTables::LineBreak_SP,   
QUnicodeTables::LineBreak_GL,
                        QUnicodeTables::LineBreak_WJ,   
QUnicodeTables::LineBreak_CL,
                        QUnicodeTables::LineBreak_QU,   
QUnicodeTables::LineBreak_QU_Pi,
                        QUnicodeTables::LineBreak_QU_Pf,
QUnicodeTables::LineBreak_CP,
                        QUnicodeTables::LineBreak_EX,   
QUnicodeTables::LineBreak_IS,
                        QUnicodeTables::LineBreak_SY,   
QUnicodeTables::LineBreak_BK,
                        QUnicodeTables::LineBreak_CR,   
QUnicodeTables::LineBreak_LF,
                        QUnicodeTables::LineBreak_ZW};
                if (std::any_of(std::begin(lb15b), std::end(lb15b),
                                [nncls](auto x) { return x == nncls; })) {
                    ncls = QUnicodeTables::LineBreak_QU_Pf;
                }

Clang 21 -O3 turns this into a 64-bit immediate load and a bit-test instruction

        cmpl    $49, %ecx
        ja      .LBB1_278
        movl    $5, %ebp
        movabsq $1055531498213054, %rax         # imm = 0x3C00014000EBE
        btq     %rcx, %rax
        jb      .LBB1_281

(the actual value may not be accurate, as I'm comparing two different loops)

while GCC 15 duly optimizes the lookup by unrolling and using vector
instructions (I guess, I'm an assembler noob), but fails to optimize into the
simple bitvector lookup:

        movl    36(%rsp), %ecx
        movl    40(%rsp), %r8d
        movl    $26, 184(%rsp)
        movdqa  .LC3(%rip), %xmm0
        movq    .LC6(%rip), %rax
        leaq    188(%rsp), %rsi
        cmpl    $46, %ecx
        movl    112(%rsp), %r10d
        movq    120(%rsp), %r9
        movaps  %xmm0, 128(%rsp)
        movdqa  .LC4(%rip), %xmm0
        movq    %rax, 176(%rsp)
        leaq    128(%rsp), %rax
        movaps  %xmm0, 144(%rsp)
        movdqa  .LC5(%rip), %xmm0
        movaps  %xmm0, 160(%rsp)
        je      .L356
        leaq    132(%rsp), %rax
        cmpl    %ecx, 132(%rsp)
        je      .L356
        leaq    136(%rsp), %rax
        cmpl    %ecx, 136(%rsp)
        je      .L356
        leaq    140(%rsp), %rax
.L357:
        cmpl    %ecx, (%rax)
        je      .L356
        leaq    4(%rax), %rdx
        movq    %rdx, %rax
        cmpl    %ecx, (%rdx)
        je      .L356
        addq    $4, %rax
        cmpl    %ecx, (%rax)
        je      .L356
        leaq    8(%rdx), %rax
        cmpl    %ecx, 8(%rdx)
        je      .L356
        leaq    12(%rdx), %rax
        cmpq    %rsi, %rax
        jne     .L357
        movq    %r9, 112(%rsp)
        movl    %r10d, 40(%rsp)
        movl    %r8d, 36(%rsp)

.LC3:
        .long   46
        .long   7
        .long   28
        .long   1
        .align 16
.LC4:
        .long   3
        .long   4
        .long   5
        .long   2
        .align 16
.LC5:
        .long   9
        .long   11
        .long   10
        .long   49
        .section        .rodata.cst8,"aM",@progbits,8
        .align 8
.LC6:
        .long   47
        .long   48
        .align 8

The respective command lines are as follows:

g++ -DBACKTRACE_HEADER=\"execinfo.h\" -DCore_EXPORTS
-DELF_INTERPRETER=\"/lib64/ld-linux-x86-64.so.2\" -DQT_ASCII_CAST_WARNINGS
-DQT_BUILDING_QT -DQT_BUILD_CORE_LIB -DQT_DEPRECATED_WARNINGS
-DQT_EXPLICIT_QFILE_CONSTRUCTION_FROM_PATH -DQT_LEAN_HEADERS=1 -DQT_MOC_COMPAT
-DQT_NO_CAST_TO_ASCII -DQT_NO_CONTEXTLESS_CONNECT -DQT_NO_DEBUG -DQT_NO_FOREACH
-DQT_NO_JAVA_STYLE_ITERATORS -DQT_NO_NARROWING_CONVERSIONS_IN_CONNECT
-DQT_NO_QASCONST -DQT_NO_QEXCHANGE -DQT_NO_QPAIR -DQT_NO_QSNPRINTF
-DQT_NO_STD_FORMAT_SUPPORT -DQT_NO_USING_NAMESPACE
-DQT_RANDOMACCESSASYNCFILE_THREAD -DQT_STRICT_QLIST_ITERATORS
-DQT_TYPESAFE_FLAGS -DQT_USE_NODISCARD_FILE_OPEN -DQT_USE_QSTRINGBUILDER
-D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE
-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST
-D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/Core_autogen/include
-I/home/marc/Qt/qtbase-submit-build-clang/include
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore
-I/home/marc/Qt/qtbase-submit/src/corelib
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/global
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/kernel
-I/home/marc/Qt/qtbase-submit/src/corelib/../3rdparty/tinycbor/src
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore/6.11.0
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore/6.11.0/QtCore
-I/home/marc/Qt/qtbase-submit/src/corelib/../3rdparty/forkfd
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/.rcc
-I/home/marc/Qt/qtbase-submit-build-clang/mkspecs/linux-clang-libc++ -isystem
/usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -g
-DNDEBUG -O3 -std=gnu++26 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden
-Wall -Wextra -Werror -Wno-error=deprecated-declarations
-Wno-error=deprecated-enum-enum-conversion -Wno-error=unused-but-set-variable
-U_FORTIFY_SOURCE -fcf-protection=full -D_FORTIFY_SOURCE=2
-ftrivial-auto-var-init=pattern -fstack-protector-strong -fexceptions -MD -MT
src/corelib/CMakeFiles/Core.dir/text/qunicodetools.cpp.o -MF
src/corelib/CMakeFiles/Core.dir/text/qunicodetools.cpp.o.d -o
qunicodetools.cpp.s -c
/home/marc/Qt/qtbase-submit/src/corelib/text/qunicodetools.cpp

clang++ -DBACKTRACE_HEADER=\"execinfo.h\" -DCore_EXPORTS
-DELF_INTERPRETER=\"/lib64/ld-linux-x86-64.so.2\" -DQT_ASCII_CAST_WARNINGS
-DQT_BUILDING_QT -DQT_BUILD_CORE_LIB -DQT_DEPRECATED_WARNINGS
-DQT_EXPLICIT_QFILE_CONSTRUCTION_FROM_PATH -DQT_LEAN_HEADERS=1 -DQT_MOC_COMPAT
-DQT_NO_CAST_TO_ASCII -DQT_NO_CONTEXTLESS_CONNECT -DQT_NO_DEBUG -DQT_NO_FOREACH
-DQT_NO_JAVA_STYLE_ITERATORS -DQT_NO_NARROWING_CONVERSIONS_IN_CONNECT
-DQT_NO_QASCONST -DQT_NO_QEXCHANGE -DQT_NO_QPAIR -DQT_NO_QSNPRINTF
-DQT_NO_STD_FORMAT_SUPPORT -DQT_NO_USING_NAMESPACE
-DQT_RANDOMACCESSASYNCFILE_THREAD -DQT_STRICT_QLIST_ITERATORS
-DQT_TYPESAFE_FLAGS -DQT_USE_NODISCARD_FILE_OPEN -DQT_USE_QSTRINGBUILDER
-D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE
-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST
-D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/Core_autogen/include
-I/home/marc/Qt/qtbase-submit-build-clang/include
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore
-I/home/marc/Qt/qtbase-submit/src/corelib
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/global
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/kernel
-I/home/marc/Qt/qtbase-submit/src/corelib/../3rdparty/tinycbor/src
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore/6.11.0
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore/6.11.0/QtCore
-I/home/marc/Qt/qtbase-submit/src/corelib/../3rdparty/forkfd
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/.rcc
-I/home/marc/Qt/qtbase-submit-build-clang/mkspecs/linux-clang-libc++ -isystem
/usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -g
-DNDEBUG -O3 -std=gnu++26 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden
-Wall -Wextra -stdlib=libc++ -Werror "-Wno-error=#warnings"
-Wno-error=deprecated-declarations -Wno-error=deprecated-enum-enum-conversion
-Wno-error=deprecated-copy-with-user-provided-copy
-Wno-error=unused-but-set-variable -U_FORTIFY_SOURCE -fcf-protection=full
-D_FORTIFY_SOURCE=2 -ftrivial-auto-var-init=pattern -fstack-protector-strong
-fexceptions -MD -MT src/corelib/CMakeFiles/Core.dir/text/qunicodetools.cpp.o
-MF src/corelib/CMakeFiles/Core.dir/text/qunicodetools.cpp.o.d -o
qunicodetools.cpp.o -c
/home/marc/Qt/qtbase-submit/src/corelib/text/qunicodetools.cpp

I'm attaching preprocessed source for both. The foo() function is merely a
marker delimiting the actual code area (foox() delimits a hand-rolled bitmask
operation).

I don't think this is particularly important to optimize; I surely wouldn't
have expected the compiler to fold the loop like that, but seeing as CLang does
it, I decided to report :)
  • [Bug c++/122393] New: GCC -O3 fa... marc.mutz at hotmail dot com via Gcc-bugs

Reply via email to