https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
--- Comment #8 from ktkachov at gcc dot gnu.org --- btw looks likes ICC vectorises this as well as unrolling: ..B1.14: movl (%rcx,%rbx,4), %r15d vmovsd (%rdi,%r15,8), %xmm2 movl 4(%rcx,%rbx,4), %r15d vmovhpd (%rdi,%r15,8), %xmm2, %xmm3 movl 8(%rcx,%rbx,4), %r15d vfmadd231pd (%r10,%rbx,8), %xmm3, %xmm0 vmovsd (%rdi,%r15,8), %xmm4 movl 12(%rcx,%rbx,4), %r15d vmovhpd (%rdi,%r15,8), %xmm4, %xmm5 movl 16(%rcx,%rbx,4), %r15d vfmadd231pd 16(%r10,%rbx,8), %xmm5, %xmm1 vmovsd (%rdi,%r15,8), %xmm6 movl 20(%rcx,%rbx,4), %r15d vmovhpd (%rdi,%r15,8), %xmm6, %xmm7 movl 24(%rcx,%rbx,4), %r15d vfmadd231pd 32(%r10,%rbx,8), %xmm7, %xmm0 vmovsd (%rdi,%r15,8), %xmm8 movl 28(%rcx,%rbx,4), %r15d vmovhpd (%rdi,%r15,8), %xmm8, %xmm9 vfmadd231pd 48(%r10,%rbx,8), %xmm9, %xmm1 addq $8, %rbx cmpq %r14, %rbx jb ..B1.14 Is that something GCC could reasonably do?