https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Last reconfirmed| |2021-07-29 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 CC| |rguenth at gcc dot gnu.org Summary|vectorizer doesn't |BB vectorizer doesn't |categorize vector construct |handle lowpart of existing |cost right. |vector Blocks| |53947 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- The basic-block vectorizer is currently limited as to what "existing" vectors it recognizes. In this testcase we're accessing only the lowpart of 'src', something we cannot yet model in vectorizable_slp_permutation. The specific case isn't hard to fix, we'd get <bb 2> [local count: 1073741824]: _31 = VIEW_CONVERT_EXPR<vector(8) int>(src_18(D)); vect__2.4_33 = [vec_unpack_lo_expr] _31; vect__2.4_34 = [vec_unpack_hi_expr] _31; MEM <vector(4) long long int> [(long long int *)&tem] = vect__2.4_33; MEM <vector(4) long long int> [(long long int *)&tem + 32B] = vect__2.4_34; _17 = MEM[(v8di *)&tem]; *dst_28(D) = _17; tem ={v} {CLOBBER}; return; so we then fail to elide the temporary, producing bar_s32_s64: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 vpmovsxdq %xmm0, %ymm1 vextracti128 $0x1, %ymm0, %xmm0 movq %rsp, %rbp .cfi_def_cfa_register 6 andq $-64, %rsp subq $8, %rsp vpmovsxdq %xmm0, %ymm0 vmovdqa %ymm1, -56(%rsp) vmovdqa %ymm0, -24(%rsp) vmovdqa64 -56(%rsp), %zmm2 vmovdqa64 %zmm2, (%rdi) leave .cfi_def_cfa 7, 8 ret it looks like there's no V8SI->V8DI conversion optab or we choose V4DI for some other reason as prefered vector mode. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations