avx512dq-concatv2si-1.c

xuepeng.guo at intel dot com Tue, 13 Nov 2018 18:51:37 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718


--- Comment #4 from Terry Guo <xuepeng.guo at intel dot com> ---
(In reply to Uroš Bizjak from comment #2)
> Following testcase:
> 
> --cut here--
> typedef int V __attribute__((vector_size (8)));
> 
> void foo (int x, int y)
> {
>   register int a __asm ("xmm1");
>   register int b __asm ("xmm2");
>   register V c __asm ("xmm3");
>   a = x;
>   b = y;
>   asm volatile ("" : "+v" (a), "+v" (b));
>   c = (V) { a, b };
>   asm volatile ("" : "+v" (c));
> }
> --cut here--
> 
> gets compiled with -O2 -mavx -mtune=intel:
> 
>         vmovd   %edi, %xmm1
>         vmovd   %esi, %xmm2
>         vmovd   %xmm2, %eax
>         vpinsrd $1, %eax, %xmm1, %xmm3
>         ret
> 
> The relevant pattern is defined as:
> 
> (define_insn "*vec_concatv2si_sse4_1"
>   [(set (match_operand:V2SI 0 "register_operand"
>         "=Yr,*x, x, v,Yr,*x, v, v, *y,*y")
>       (vec_concat:V2SI
>         (match_operand:SI 1 "nonimmediate_operand"
>         "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
>         (match_operand:SI 2 "nonimm_or_0_operand"
>         " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
>   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>   "@
>    pinsrd\t{$1, %2, %0|%0, %2, 1}
>    pinsrd\t{$1, %2, %0|%0, %2, 1}
>    vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
>    vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
>    punpckldq\t{%2, %0|%0, %2}
>    punpckldq\t{%2, %0|%0, %2}
>    vpunpckldq\t{%2, %1, %0|%0, %1, %2}
>    %vmovd\t{%1, %0|%0, %1}
>    punpckldq\t{%2, %0|%0, %2}
>    movd\t{%1, %0|%0, %1}"
> 
> but for some reason RA chooses alternative 2 (x<-x,rm) instead of
> alternative 6 (v<-Yv,Yv), although alternative 2 needs an extra reload from
> %xmm2 to %eax.

I dig this a bit and looks like we missed something in combine pass, hence fail
to get a pattern that can match alternative 6. The combine pass dump of old gcc
shows:
-------------------
      REG_UNUSED flags:CC
insn_cost 4 for    10: r82:SI=xmm16:SI
      REG_DEAD xmm16:SI
insn_cost 4 for    11: r83:SI=xmm17:SI
      REG_DEAD xmm17:SI
insn_cost 4 for    12: r87:V2SI=vec_concat(r82:SI,r83:SI)
      REG_DEAD r83:SI
      REG_DEAD r82:SI
-------------------

then we got:
-------------------
Trying 10 -> 12:
   10: r82:SI=xmm16:SI
      REG_DEAD xmm16:SI
   12: r87:V2SI=vec_concat(r82:SI,r83:SI)
      REG_DEAD r83:SI
      REG_DEAD r82:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
    (vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ])
        (reg:SI 83 [ b.1_2 ])))
allowing combination of insns 10 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 10.
modifying insn i3    12: r87:V2SI=vec_concat(xmm16:SI,r83:SI)
      REG_DEAD xmm16:SI
      REG_DEAD r83:SI
deferring rescan insn with uid = 12.

Trying 11 -> 12:
   11: r83:SI=xmm17:SI
      REG_DEAD xmm17:SI
   12: r87:V2SI=vec_concat(xmm16:SI,r83:SI)
      REG_DEAD xmm16:SI
      REG_DEAD r83:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
    (vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ])
        (reg/v:SI 53 xmm17 [ b ])))
allowing combination of insns 11 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 11.
modifying insn i3    12: r87:V2SI=vec_concat(xmm16:SI,xmm17:SI)
      REG_DEAD xmm17:SI
      REG_DEAD xmm16:SI
deferring rescan insn with uid = 12.
-------------------

There are two successful combine attempts. We end up with pattern that can
match alternative 6.

However dump from current GCC trunk shows:
-------------------
insn_cost 4 for    19: r90:SI=xmm16:SI
      REG_DEAD xmm16:SI
insn_cost 4 for    10: r82:SI=r90:SI
      REG_DEAD r90:SI
insn_cost 4 for    20: r91:SI=xmm17:SI
      REG_DEAD xmm17:SI
insn_cost 4 for    11: r83:SI=r91:SI
      REG_DEAD r91:SI
insn_cost 4 for    12: r87:V2SI=vec_concat(r82:SI,r83:SI)
      REG_DEAD r83:SI
      REG_DEAD r82:SI
insn_cost 4 for    13: xmm3:V2SI=r87:V2SI
      REG_DEAD r87:V2SI
-------------------
Trying 11 -> 12:
   11: r83:SI=r91:SI
      REG_DEAD r91:SI
   12: r87:V2SI=vec_concat(r90:SI,r83:SI)
      REG_DEAD r90:SI
      REG_DEAD r83:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
    (vec_concat:V2SI (reg:SI 90)
        (reg:SI 91)))
allowing combination of insns 11 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 11.
modifying insn i3    12: r87:V2SI=vec_concat(r90:SI,r91:SI)
      REG_DEAD r91:SI
      REG_DEAD r90:SI
deferring rescan insn with uid = 12.
-------------------

We end up with "12: r87:V2SI=vec_concat(r90:SI,r91:SI)", later in LRA pass, the
operand r90 is replaced with XMM register, the r91 is kept as general register.
Then no chance match against preferred alternative 6.

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

Reply via email to