https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115116
Bug ID: 115116 Summary: [x86] rtx_cost is overestimated for big size memory. Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef char v16qi __attribute__((vector_size(16))); v16qi __attribute__((noipa)) foo (v16qi a) { v16qi c = __extension__(v16qi) { 0x1,0x2,0x3,0x4,0x5,0x6,0x7,0x8, 0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1 }; return a * c; } with -O2 -march=x86-64-v4 .cfi_startproc vpmovzxbw .LC0(%rip), %ymm1 vpmovzxbw %xmm0, %ymm0 vpmullw %ymm1, %ymm0, %ymm0 vpmovwb %ymm0, %xmm0 vzeroupper but it can be optimized to .cfi_startproc vpmovzxbw %xmm0, %ymm0 vpmullw .LC0(%rip), %ymm0, %ymm0 vpmovwb %ymm0, %xmm0 vzeroupper but failed due to cost comparison .cfi_startproc vpmovzxbw %xmm0, %ymm0 vpmullw .LC0(%rip), %ymm0, %ymm0 vpmovwb %ymm0, %xmm0 vzeroupper Successfully matched this instruction: (set (reg:V16HI 104) (mem/u/c:V16HI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0 S32 A256])) rejecting combination of insns 6 and 10 original costs 9 + 4 = 13 replacement cost 17 For bigger mode, rtx_cost use factor = GET_MODE_SIZE / UNIT_PER_WORD, and return cost = factor * COSTS_N_INSNS (1), that's too much for 256/512-bit vector, they're probably loaded/stored with sse register.