Following testcase from PR target/33329 shows the problem where gcc doesn't
fold vector arithmetic operations with constant arguments to a load of vector
constant.
For clarity, sse4 will be used, but the same problem is present on sse2.
--cut here--
extern void g (int *);
void f (void)
{
int tabs[8], tabcount;
for (tabcount = 1; tabcount <= 8; tabcount += 7)
{
int i;
for (i = 0; i < 8; i++)
tabs[i] = 2 * i;
g (tabs);
}
}
--cut here--
produces (gcc -O2 -msse4 -ftree-vectorize):
.LCFI2:
movdqa .LC0(%rip), %xmm1
leaq 16(%rsp), %rbp
movdqa .LC1(%rip), %xmm0
paddd .LC2(%rip), %xmm1
pmulld %xmm1, %xmm0 # 19 *sse4_1_mulv4si3 [length = 4]
movdqa %xmm0, (%rsp)
.L2:
movdqa .LC3(%rip), %xmm0 # 54
movq %rbp, %rdi
addl $1, %ebx
movdqa (%rsp), %xmm2 # 55
movdqa %xmm0, (%rbp)
movdqa %xmm2, 16(%rbp)
call g
cmpl $2, %ebx
jne .L2
All instructions above the loop have constant arguments. This is evident from
combine RTL dump, where insn 19 is represented using following RTX:
(insn 19 17 25 2 pr33329.c:13 (set (reg:V4SI 78)
(mult:V4SI (reg:V4SI 77)
(reg:V4SI 73))) 1136 {*sse4_1_mulv4si3} (expr_list:REG_DEAD
(reg:V4S
I 73)
(expr_list:REG_EQUAL (const_vector:V4SI [
(const_int 8 [0x8])
(const_int 10 [0xa])
(const_int 12 [0xc])
(const_int 14 [0xe])
])
(nil))))
Actually gcc already calculated correct const_vector value, but it looks like
it doesn't know what to do with it. For optimal code, insn #55 should load
vector constant from the constant pool in the same way as insn #54.
--
Summary: Vector RTL arithmetic operations with constant arguments
are not fully folded.
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: ubizjak at gmail dot com
GCC build triplet: x86_64-pc-linux-gnu
GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: x86_64-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33353