https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754
Allan Jensen <linux at carewolf dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |linux at carewolf dot com --- Comment #7 from Allan Jensen <linux at carewolf dot com> --- This is significantly worse with integer operands. _mm256_storeu_si256((__m256i *)&data[3], _mm256_add_epi32(_mm256_loadu_si256((const __m256i *)&data[0]), _mm256_loadu_si256((const __m256i *)&data[1])) ); compiles to: vmovdqu 0x20(%rax),%xmm0 vinserti128 $0x1,0x30(%rax),%ymm0,%ymm0 vmovdqu (%rax),%xmm1 vinserti128 $0x1,0x10(%rax),%ymm1,%ymm1 vpaddd %ymm1,%ymm0,%ymm0 vmovups %xmm0,0x60(%rax) vextracti128 $0x1,%ymm0,0x70(%rax)