https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035

            Bug ID: 110035
           Summary: Missed optimization for dependent assignment
                    statements
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ptk.prasertsuk at gmail dot com
  Target Milestone: ---

Created attachment 55212
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55212&action=edit
Test case, compiled with -stdc++=20 -O2

The test case, when compiled, produces additional move instructions:

movdqu  (%rdi), %xmm2
movdqu  16(%rdi), %xmm1
movdqu  32(%rdi), %xmm0
movl    $48, %edi
movaps  %xmm2, 32(%rsp)
movaps  %xmm1, 16(%rsp)
movaps  %xmm0, (%rsp)
call    _Znwm@PLT
movdqa  32(%rsp), %xmm2
movdqa  16(%rsp), %xmm1
movdqa  (%rsp), %xmm0
movq    %rax, %rdi
movups  %xmm2, (%rax)
movups  %xmm1, 16(%rax)
movups  %xmm0, 32(%rax)

compared to more optimized result using clang++ 14.0.0 with same flags:

callq   _Znwm@PLT
movups  (%rbx), %xmm0
movups  16(%rbx), %xmm1
movups  32(%rbx), %xmm2
movups  %xmm0, (%rax)
movups  %xmm1, 16(%rax)
movups  %xmm2, 32(%rax)
movq    %rax, %rdi

Clang has MemCpyOptPass which detects and removes memory dependency of the
second set of move instructions, which allows Dead Store Elimination pass to
remove the first set of move instructions.

g++-12 -v
Using built-in specs.
COLLECT_GCC=g++-12
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
12.1.0-2ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.1.0 (Ubuntu 12.1.0-2ubuntu1~22.04)

Reply via email to