https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

--- Comment #4 from Daniel Fruzynski <bugzi...@poradnik-webmastera.com> ---
One more case. Code has to process diagonal half of matrix and uses SSE
intrinsics - see test1() below. When n is constant like in test2() below, gcc
unrolls loops. However more more transform could be performed, replace pairs of
SSE instructions with one AVX one.

#include <stdint.h>
#include "immintrin.h"

void test1(double data[100][100], unsigned int n)
{
    for (int i = 0; i < n; i++)
    {
        for (int j = 0; j < i; j += 2)
        {
            __m128d v = _mm_loadu_pd(&data[i][j]);
            v = _mm_mul_pd(v, v);
            _mm_storeu_pd(&data[i][j], v);
        }
    }
}

void test2(double data[100][100])
{
    const unsigned int n = 6;
    for (int i = 0; i < n; i++)
    {
        for (int j = 0; j < i; j += 2)
        {
            __m128d v = _mm_loadu_pd(&data[i][j]);
            v = _mm_mul_pd(v, v);
            _mm_storeu_pd(&data[i][j], v);
        }
    }
}

Reply via email to