While benchmarking the auto-vectoriser on Libav, I noticed a performance
regression in gcc 4.7 (both FSF and Linaro) compared to gcc 4.6 in the AAC
decoder.  I narrowed it down to this function:

static void ps_hybrid_analysis_ileave_c(float (*out)[32][2],
                                        float L[2][38][64],
                                        int i, int len)
{
    int j;

    for (; i < 64; i++) {
        for (j = 0; j < len; j++) {
            out[i][j][0] = L[0][j][i];
            out[i][j][1] = L[1][j][i];
        }
    }
}

While gcc 4.6 does not attempt to vectorise this at all, 4.7 goes crazy
with a massive slowdown, about 20x slower than non-vectorised with Linaro
4.7 and much worse with FSF 4.7.

Let me know if you need more information.

-- 
Mans Rullgard / mru

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to