https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202
--- Comment #4 from Daniel Fruzynski <bugzi...@poradnik-webmastera.com> --- One more case. Code has to process diagonal half of matrix and uses SSE intrinsics - see test1() below. When n is constant like in test2() below, gcc unrolls loops. However more more transform could be performed, replace pairs of SSE instructions with one AVX one. #include <stdint.h> #include "immintrin.h" void test1(double data[100][100], unsigned int n) { for (int i = 0; i < n; i++) { for (int j = 0; j < i; j += 2) { __m128d v = _mm_loadu_pd(&data[i][j]); v = _mm_mul_pd(v, v); _mm_storeu_pd(&data[i][j], v); } } } void test2(double data[100][100]) { const unsigned int n = 6; for (int i = 0; i < n; i++) { for (int j = 0; j < i; j += 2) { __m128d v = _mm_loadu_pd(&data[i][j]); v = _mm_mul_pd(v, v); _mm_storeu_pd(&data[i][j], v); } } }