On Wed, Oct 30, 2013 at 10:47:13AM +0100, Jakub Jelinek wrote: > Hi! > > Yesterday I've noticed that for AVX which allows unaligned operands in > AVX arithmetics instructions we still don't combine unaligned loads with the > AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize > void > f1 (int *__restrict e, int *__restrict f) > { > int i; > for (i = 0; i < 1024; i++) > e[i] = f[i] * 7; > } > > void > f2 (int *__restrict e, int *__restrict f) > { > int i; > for (i = 0; i < 1024; i++) > e[i] = f[i]; > } > we have: > vmovdqu (%rsi,%rax), %xmm0 > vpmulld %xmm1, %xmm0, %xmm0 > vmovups %xmm0, (%rdi,%rax) > in the first loop. Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT > *mov<mode>_internal patterns (and various others) use misaligned_operand > to see if they should emit vmovaps or vmovups (etc.), so as suggested by
That is intentional. In pre-haswell architectures splitting load is faster than 32 byte load. See Intel® 64 and IA-32 Architectures Optimization Reference Manual for details.