https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194
--- Comment #14 from Alexander Monakov <amonakov at gcc dot gnu.org> --- I see, there are more weaknesses than I thought. For CSE (or rather fwprop?) I was thinking about a simpler case where the extracted-from value is loaded from memory, but even in trivial cases RTL optimizers cannot clean it up today (so it wouldn't get any better with separate temporaries): #define N 16 typedef int T; typedef T V __attribute__((vector_size(N))); T f(V *px, long i) { V x = *px; return x[i]; } f: movdqa (%rdi), %xmm0 movaps %xmm0, -24(%rsp) movl -24(%rsp,%rsi,4), %eax ret