https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91722

            Bug ID: 91722
           Summary: gcc generates sub-optimal assembly when AVX
                    instructions are used.
           Product: gcc
           Version: 9.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: maxim.yegorushkin at gmail dot com
  Target Milestone: ---

The following code:

    #include <immintrin.h>

    __m256 copysign_ps(__m256 from, __m256 to) {
        constexpr float signbit = -0.f;
        auto const avx_sigbit = _mm256_broadcast_ss(&signbit);
        return _mm256_or_ps(_mm256_and_ps(avx_sigbit, from),
_mm256_andnot_ps(avx_sigbit, to));
    }

When compiled with `g++-9.2 -O2 -mavx -std=c++11` produces the following
assembly:

    copysign_ps(float __vector(8), float __vector(8)):
            push    rbp
            vmovaps ymm2, ymm0
            mov     rbp, rsp
            and     rsp, -32
            vbroadcastss    ymm0, DWORD PTR .LC0[rip]
            vandnps ymm1, ymm0, ymm1
            vandps  ymm0, ymm0, ymm2
            vorps   ymm0, ymm0, ymm1
            leave
            ret
    .LC0:
            .long   2147483648

The 4 instructions involving rbp, rsp and leave do not seem to be necessary at
all.

When compiled with `clang++-8.0 -O2 -mavx -std=c++11` it produces assembly with
only expected instructions:

    .LCPI0_0:
            .long   2147483648              # 0x80000000
            .long   2147483648              # 0x80000000
            .long   2147483648              # 0x80000000
            .long   2147483648              # 0x80000000
            .long   2147483648              # 0x80000000
            .long   2147483648              # 0x80000000
            .long   2147483648              # 0x80000000
            .long   2147483648              # 0x80000000
    .LCPI0_1:
            .long   2147483647              # 0x7fffffff
            .long   2147483647              # 0x7fffffff
            .long   2147483647              # 0x7fffffff
            .long   2147483647              # 0x7fffffff
            .long   2147483647              # 0x7fffffff
            .long   2147483647              # 0x7fffffff
            .long   2147483647              # 0x7fffffff
            .long   2147483647              # 0x7fffffff
    copysign_ps(float __vector(8), float __vector(8)):                 #
@copysign_ps(float __vector(8), float __vector(8))
            vandps  ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
            vandps  ymm1, ymm1, ymmword ptr [rip + .LCPI0_1]
            vorps   ymm0, ymm1, ymm0
            ret

Reply via email to