https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035
Bug ID: 110035 Summary: Missed optimization for dependent assignment statements Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ptk.prasertsuk at gmail dot com Target Milestone: --- Created attachment 55212 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55212&action=edit Test case, compiled with -stdc++=20 -O2 The test case, when compiled, produces additional move instructions: movdqu (%rdi), %xmm2 movdqu 16(%rdi), %xmm1 movdqu 32(%rdi), %xmm0 movl $48, %edi movaps %xmm2, 32(%rsp) movaps %xmm1, 16(%rsp) movaps %xmm0, (%rsp) call _Znwm@PLT movdqa 32(%rsp), %xmm2 movdqa 16(%rsp), %xmm1 movdqa (%rsp), %xmm0 movq %rax, %rdi movups %xmm2, (%rax) movups %xmm1, 16(%rax) movups %xmm0, 32(%rax) compared to more optimized result using clang++ 14.0.0 with same flags: callq _Znwm@PLT movups (%rbx), %xmm0 movups 16(%rbx), %xmm1 movups 32(%rbx), %xmm2 movups %xmm0, (%rax) movups %xmm1, 16(%rax) movups %xmm2, 32(%rax) movq %rax, %rdi Clang has MemCpyOptPass which detects and removes memory dependency of the second set of move instructions, which allows Dead Store Elimination pass to remove the first set of move instructions. g++-12 -v Using built-in specs. COLLECT_GCC=g++-12 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 12.1.0-2ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.1.0 (Ubuntu 12.1.0-2ubuntu1~22.04)