On Wed, Oct 30, 2013 at 10:47 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> Yesterday I've noticed that for AVX which allows unaligned operands in > AVX arithmetics instructions we still don't combine unaligned loads with the > AVX arithmetics instructions. So say for -O2 -mavx -ftree-vectorize This is actually PR 47754 that fell below radar for some reason... > we have: > vmovdqu (%rsi,%rax), %xmm0 > vpmulld %xmm1, %xmm0, %xmm0 > vmovups %xmm0, (%rdi,%rax) > in the first loop. Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT > *mov<mode>_internal patterns (and various others) use misaligned_operand > to see if they should emit vmovaps or vmovups (etc.), so as suggested by > Richard on IRC it isn't necessary to either allow UNSPEC_LOADU in memory > operands of all the various non-move AVX instructions for TARGET_AVX, or > add extra patterns to help combine, this patch instead just uses the > *mov<mode>_internal in that case (assuming initially misaligned_operand > doesn't become !misaligned_operand through RTL optimizations). Additionally No worries here. We will generate movdqa, and it is definitely a gcc bug if RTL optimizations change misaligned operand to aligned. > the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned > loads, which usually means combine will fail, by doing the load into a > temporary pseudo in that case and then doing a pseudo to pseudo move with > gen_lowpart on the rhs (which will be merged soon after into following > instructions). Is this similar to PR44141? There were similar problems with V4SFmode subregs, so combine was not able to merge load to the arithemtic insn. > I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my > bootstrap/regtest server isn't AVX capable. I can bootstrap the patch later today on IvyBridge with --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx. Uros.