https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79291
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- It also looks like mips lacks implementation of any of the vectorizer cost hooks and thus defaults to default_builtin_vectorization_cost which means that unaligned loads/stores have double cost. And mips supports misaligned loads/stores via movmisalign (for MSA). For daxpy: for (i = 0;i < n; i++) { dy[i] = dy[i] + da*dx[i]; } the above makes peeling for alignment of dy[] profitable (and I'd generally agree because esp. misaligned stores do have a real penalty - though likely not when the store queue is not contended as likely in this case). x86_64 peels for alignment as well and we get .L6: movups (%rax,%r8), %xmm1 addl $1, %r9d mulps %xmm2, %xmm1 addps (%r11,%r8), %xmm1 movaps %xmm1, (%r11,%r8) addq $16, %r8 cmpl %ebx, %r9d jb .L6 and similar base+index addressing. IVO does see the indices are the same though. # i_46 = PHI <i_36(7), 0(4)> prolog_loop_adjusted_niters.6_48 = (sizetype) prolog_loop_niters.5_34; niters.7_49 = niters.3_40 - prolog_loop_niters.5_34; bnd.8_69 = niters.7_49 >> 2; _75 = prolog_loop_adjusted_niters.6_48 * 4; vectp_dy.12_74 = dy_15(D) + _75; _80 = prolog_loop_adjusted_niters.6_48 * 4; vectp_dx.15_79 = dx_16(D) + _80; vect_cst__84 = {da_14(D), da_14(D), da_14(D), da_14(D)}; _88 = prolog_loop_adjusted_niters.6_48 * 4; vectp_dy.20_87 = dy_15(D) + _88; shows the missed CSE from the vectorizer (and a redundant IV). During DR analysis we can in theory keep a list of stmts that share the "same" DR (we have this for group reads already) and record the generated IVs on the "master" DR. A region-based CSE/DCE would still be my preference in the end.