https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122393
Bug ID: 122393
Summary: GCC -O3 fails to optimize loop over constexpr array of
enum implementing contains() to a bitmask AND
Product: gcc
Version: 15.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: marc.mutz at hotmail dot com
Target Milestone: ---
Created attachment 62620
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62620&action=edit
GCC proprocessed source
In Qt, we have code like this:
constexpr QUnicodeTables::LineBreakClass lb15b[] = {
QUnicodeTables::LineBreak_SP,
QUnicodeTables::LineBreak_GL,
QUnicodeTables::LineBreak_WJ,
QUnicodeTables::LineBreak_CL,
QUnicodeTables::LineBreak_QU,
QUnicodeTables::LineBreak_QU_Pi,
QUnicodeTables::LineBreak_QU_Pf,
QUnicodeTables::LineBreak_CP,
QUnicodeTables::LineBreak_EX,
QUnicodeTables::LineBreak_IS,
QUnicodeTables::LineBreak_SY,
QUnicodeTables::LineBreak_BK,
QUnicodeTables::LineBreak_CR,
QUnicodeTables::LineBreak_LF,
QUnicodeTables::LineBreak_ZW};
if (std::any_of(std::begin(lb15b), std::end(lb15b),
[nncls](auto x) { return x == nncls; })) {
ncls = QUnicodeTables::LineBreak_QU_Pf;
}
Clang 21 -O3 turns this into a 64-bit immediate load and a bit-test instruction
cmpl $49, %ecx
ja .LBB1_278
movl $5, %ebp
movabsq $1055531498213054, %rax # imm = 0x3C00014000EBE
btq %rcx, %rax
jb .LBB1_281
(the actual value may not be accurate, as I'm comparing two different loops)
while GCC 15 duly optimizes the lookup by unrolling and using vector
instructions (I guess, I'm an assembler noob), but fails to optimize into the
simple bitvector lookup:
movl 36(%rsp), %ecx
movl 40(%rsp), %r8d
movl $26, 184(%rsp)
movdqa .LC3(%rip), %xmm0
movq .LC6(%rip), %rax
leaq 188(%rsp), %rsi
cmpl $46, %ecx
movl 112(%rsp), %r10d
movq 120(%rsp), %r9
movaps %xmm0, 128(%rsp)
movdqa .LC4(%rip), %xmm0
movq %rax, 176(%rsp)
leaq 128(%rsp), %rax
movaps %xmm0, 144(%rsp)
movdqa .LC5(%rip), %xmm0
movaps %xmm0, 160(%rsp)
je .L356
leaq 132(%rsp), %rax
cmpl %ecx, 132(%rsp)
je .L356
leaq 136(%rsp), %rax
cmpl %ecx, 136(%rsp)
je .L356
leaq 140(%rsp), %rax
.L357:
cmpl %ecx, (%rax)
je .L356
leaq 4(%rax), %rdx
movq %rdx, %rax
cmpl %ecx, (%rdx)
je .L356
addq $4, %rax
cmpl %ecx, (%rax)
je .L356
leaq 8(%rdx), %rax
cmpl %ecx, 8(%rdx)
je .L356
leaq 12(%rdx), %rax
cmpq %rsi, %rax
jne .L357
movq %r9, 112(%rsp)
movl %r10d, 40(%rsp)
movl %r8d, 36(%rsp)
.LC3:
.long 46
.long 7
.long 28
.long 1
.align 16
.LC4:
.long 3
.long 4
.long 5
.long 2
.align 16
.LC5:
.long 9
.long 11
.long 10
.long 49
.section .rodata.cst8,"aM",@progbits,8
.align 8
.LC6:
.long 47
.long 48
.align 8
The respective command lines are as follows:
g++ -DBACKTRACE_HEADER=\"execinfo.h\" -DCore_EXPORTS
-DELF_INTERPRETER=\"/lib64/ld-linux-x86-64.so.2\" -DQT_ASCII_CAST_WARNINGS
-DQT_BUILDING_QT -DQT_BUILD_CORE_LIB -DQT_DEPRECATED_WARNINGS
-DQT_EXPLICIT_QFILE_CONSTRUCTION_FROM_PATH -DQT_LEAN_HEADERS=1 -DQT_MOC_COMPAT
-DQT_NO_CAST_TO_ASCII -DQT_NO_CONTEXTLESS_CONNECT -DQT_NO_DEBUG -DQT_NO_FOREACH
-DQT_NO_JAVA_STYLE_ITERATORS -DQT_NO_NARROWING_CONVERSIONS_IN_CONNECT
-DQT_NO_QASCONST -DQT_NO_QEXCHANGE -DQT_NO_QPAIR -DQT_NO_QSNPRINTF
-DQT_NO_STD_FORMAT_SUPPORT -DQT_NO_USING_NAMESPACE
-DQT_RANDOMACCESSASYNCFILE_THREAD -DQT_STRICT_QLIST_ITERATORS
-DQT_TYPESAFE_FLAGS -DQT_USE_NODISCARD_FILE_OPEN -DQT_USE_QSTRINGBUILDER
-D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE
-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST
-D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/Core_autogen/include
-I/home/marc/Qt/qtbase-submit-build-clang/include
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore
-I/home/marc/Qt/qtbase-submit/src/corelib
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/global
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/kernel
-I/home/marc/Qt/qtbase-submit/src/corelib/../3rdparty/tinycbor/src
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore/6.11.0
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore/6.11.0/QtCore
-I/home/marc/Qt/qtbase-submit/src/corelib/../3rdparty/forkfd
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/.rcc
-I/home/marc/Qt/qtbase-submit-build-clang/mkspecs/linux-clang-libc++ -isystem
/usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -g
-DNDEBUG -O3 -std=gnu++26 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden
-Wall -Wextra -Werror -Wno-error=deprecated-declarations
-Wno-error=deprecated-enum-enum-conversion -Wno-error=unused-but-set-variable
-U_FORTIFY_SOURCE -fcf-protection=full -D_FORTIFY_SOURCE=2
-ftrivial-auto-var-init=pattern -fstack-protector-strong -fexceptions -MD -MT
src/corelib/CMakeFiles/Core.dir/text/qunicodetools.cpp.o -MF
src/corelib/CMakeFiles/Core.dir/text/qunicodetools.cpp.o.d -o
qunicodetools.cpp.s -c
/home/marc/Qt/qtbase-submit/src/corelib/text/qunicodetools.cpp
clang++ -DBACKTRACE_HEADER=\"execinfo.h\" -DCore_EXPORTS
-DELF_INTERPRETER=\"/lib64/ld-linux-x86-64.so.2\" -DQT_ASCII_CAST_WARNINGS
-DQT_BUILDING_QT -DQT_BUILD_CORE_LIB -DQT_DEPRECATED_WARNINGS
-DQT_EXPLICIT_QFILE_CONSTRUCTION_FROM_PATH -DQT_LEAN_HEADERS=1 -DQT_MOC_COMPAT
-DQT_NO_CAST_TO_ASCII -DQT_NO_CONTEXTLESS_CONNECT -DQT_NO_DEBUG -DQT_NO_FOREACH
-DQT_NO_JAVA_STYLE_ITERATORS -DQT_NO_NARROWING_CONVERSIONS_IN_CONNECT
-DQT_NO_QASCONST -DQT_NO_QEXCHANGE -DQT_NO_QPAIR -DQT_NO_QSNPRINTF
-DQT_NO_STD_FORMAT_SUPPORT -DQT_NO_USING_NAMESPACE
-DQT_RANDOMACCESSASYNCFILE_THREAD -DQT_STRICT_QLIST_ITERATORS
-DQT_TYPESAFE_FLAGS -DQT_USE_NODISCARD_FILE_OPEN -DQT_USE_QSTRINGBUILDER
-D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE
-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST
-D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/Core_autogen/include
-I/home/marc/Qt/qtbase-submit-build-clang/include
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore
-I/home/marc/Qt/qtbase-submit/src/corelib
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/global
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/kernel
-I/home/marc/Qt/qtbase-submit/src/corelib/../3rdparty/tinycbor/src
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore/6.11.0
-I/home/marc/Qt/qtbase-submit-build-clang/include/QtCore/6.11.0/QtCore
-I/home/marc/Qt/qtbase-submit/src/corelib/../3rdparty/forkfd
-I/home/marc/Qt/qtbase-submit-build-clang/src/corelib/.rcc
-I/home/marc/Qt/qtbase-submit-build-clang/mkspecs/linux-clang-libc++ -isystem
/usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -g
-DNDEBUG -O3 -std=gnu++26 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden
-Wall -Wextra -stdlib=libc++ -Werror "-Wno-error=#warnings"
-Wno-error=deprecated-declarations -Wno-error=deprecated-enum-enum-conversion
-Wno-error=deprecated-copy-with-user-provided-copy
-Wno-error=unused-but-set-variable -U_FORTIFY_SOURCE -fcf-protection=full
-D_FORTIFY_SOURCE=2 -ftrivial-auto-var-init=pattern -fstack-protector-strong
-fexceptions -MD -MT src/corelib/CMakeFiles/Core.dir/text/qunicodetools.cpp.o
-MF src/corelib/CMakeFiles/Core.dir/text/qunicodetools.cpp.o.d -o
qunicodetools.cpp.o -c
/home/marc/Qt/qtbase-submit/src/corelib/text/qunicodetools.cpp
I'm attaching preprocessed source for both. The foo() function is merely a
marker delimiting the actual code area (foox() delimits a hand-rolled bitmask
operation).
I don't think this is particularly important to optimize; I surely wouldn't
have expected the compiler to fold the loop like that, but seeing as CLang does
it, I decided to report :)