https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89654
Bug ID: 89654
Summary: Invalid reload with -march=skylake -m32
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Following testcase:
--cut here--
unsigned long long
foo (unsigned long long i)
{
return i << 3;
}
--cut here--
compiles with -O2 -march=skylake -m32 to:
subl $28, %esp
movl 32(%esp), %eax
movl 36(%esp), %edx
movl %eax, (%esp)
movl %edx, 4(%esp)
vmovdqa (%esp), %xmm1 <--- here
addl $28, %esp
vpsllq $3, %xmm1, %xmm0
vmovd %xmm0, %eax
vpextrd $1, %xmm0, %edx
ret
Please note 128bit access to a 64bit stack slot, in addition to unnecessary
moves.
In _.ira, we have:
(insn 2 4 3 2 (set (reg/v:DI 83 [ i ])
(mem/c:DI (reg/f:SI 16 argp) [1 i+0 S8 A32])) "vshift.c":3:1 66
{*movdi_internal}
(nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 14 2 (set (subreg:V2DI (reg:DI 84) 0)
(ashift:V2DI (subreg:V2DI (reg/v:DI 83 [ i ]) 0)
(const_int 3 [0x3]))) "vshift.c":4:12 3353 {ashlv2di3}
(expr_list:REG_DEAD (reg/v:DI 83 [ i ])
(nil)))
...
and in _.reload:
(insn 2 4 19 2 (set (reg/v:DI 0 ax [orig:83 i ] [83])
(mem/c:DI (plus:SI (reg/f:SI 7 sp)
(const_int 32 [0x20])) [1 i+0 S8 A32])) "vshift.c":3:1 66
{*movdi_internal}
(nil))
(insn 19 2 3 2 (set (mem/c:DI (reg/f:SI 7 sp) [2 %sfp+-16 S8 A128])
(reg/v:DI 0 ax [orig:83 i ] [83])) "vshift.c":3:1 66 {*movdi_internal}
(nil))
(note 3 19 20 2 NOTE_INSN_FUNCTION_BEG)
(insn 20 3 6 2 (set (reg:V2DI 21 xmm1 [89])
(mem/c:V2DI (reg/f:SI 7 sp) [2 %sfp+-16 S16 A128])) "vshift.c":4:12
1211 {movv2di_internal}
(nil))
(insn 6 20 14 2 (set (reg:V2DI 20 xmm0 [84])
(ashift:V2DI (reg:V2DI 21 xmm1 [89])
(const_int 3 [0x3]))) "vshift.c":4:12 3353 {ashlv2di3}
(nil))
...
Please note (insn 19) and (insn 20), where DImode value in a DImode stack slot
is loaded using V2DImode instruction.
Using -O2 -march=skylake-avx512 -m32, we get:
subl $28, %esp
movl 32(%esp), %eax
movl 36(%esp), %edx
movl %eax, (%esp)
movl %edx, 4(%esp)
vpsllq $3, (%esp), %xmm0
addl $28, %esp
vmovd %xmm0, %eax
vpextrd $1, %xmm0, %edx
ret
which is even more wrong, as V2DI move is propagated into the shift insn.
However, with -O2 -march=haswell -m32, everything works as expected:
vmovq 4(%esp), %xmm0
vpsllq $3, %xmm0, %xmm0
vmovd %xmm0, %eax
vpextrd $1, %xmm0, %edx
ret