amker at gcc dot changed:

           What    |Removed                     |Added
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from amker at gcc dot ---
Hmm, It's not mentioned at which optimization level the original bug was
reported.  I suspect O2 because vect_perm instruction is needed after
vectorization.  So current status is:
After ivopt rewriting, we generate below 8 instructions loop at O2
        movl    (%r14,%rax,4), %ecx
        movl    (%r14,%rdx,4), %esi
        movl    %esi, (%r14,%rax,4)
        movl    %ecx, (%r14,%rdx,4)
        addq    $1, %rax
        subq    $1, %rdx
        cmpl    %eax, %edx
        jg      .L14

It's better than what was reported.

at O3:
        movdqu  (%rsi,%rdx), %xmm2
        movdqa  (%r12,%rax), %xmm0
        pshufd  $27, %xmm2, %xmm1
        pshufd  $27, %xmm0, %xmm0
        movaps  %xmm1, (%r12,%rax)
        addq    $16, %rax
        movups  %xmm0, (%rsi,%rdx)
        subq    $16, %rdx
        cmpq    %rax, %rdi
        jne     .L14

Consider this fixed.

Reply via email to