Toon Moene wrote:
This is where IPA could help. I created the following main program:
real a(10), b(10), c(10)
a = 0.
b = 1.
print '(3(1x,z16))', loc(a), loc(b), loc(c)
call sum(a, b, c, 10)
print *, c(5)
end
So the alignment of a, b and c is known and is correct for vectorization
- still the loop in the subroutine looks like this (objdump -S a.out):
Inlining the "sum.f" subroutine by hand:
integer i
real a(10), b(10), c(10)
a = 0.
b = 1.
print '(3(1x,z16))', loc(a), loc(b), loc(c)
do i = 1, 10
c(i) = a(i) + b(i)
enddo
print *, c(5)
end
*does* lead to better code:
movaps 1056(%rsp), %xmm0
movq %rbp, %rdi
addps 1008(%rsp), %xmm0
movq $.LC2, 488(%rsp)
movaps %xmm0, 960(%rsp)
movl $9, 496(%rsp)
movaps 1072(%rsp), %xmm0
movl $128, 480(%rsp)
addps 1024(%rsp), %xmm0
movl $6, 484(%rsp)
movaps %xmm0, 976(%rsp)
movss 1088(%rsp), %xmm0
addss 1040(%rsp), %xmm0
movss %xmm0, 992(%rsp)
movss 1092(%rsp), %xmm0
addss 1044(%rsp), %xmm0
movss %xmm0, 996(%rsp)
i.e., a completely unrolled and (SLP) vectorized code.
So the potential is there - what we just need is an Alignment
Propagation Pass (analogous to the Constant and the Range Propagation pass).
--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html