Toon Moene wrote:

This is where IPA could help.  I created the following main program:

      real a(10), b(10), c(10)
      a = 0.
      b = 1.
      print '(3(1x,z16))', loc(a), loc(b), loc(c)
      call sum(a, b, c, 10)
      print *, c(5)
      end

So the alignment of a, b and c is known and is correct for vectorization - still the loop in the subroutine looks like this (objdump -S a.out):

Inlining the "sum.f" subroutine by hand:

      integer i
      real a(10), b(10), c(10)
      a = 0.
      b = 1.
      print '(3(1x,z16))', loc(a), loc(b), loc(c)
      do i = 1, 10
         c(i) = a(i) + b(i)
      enddo
      print *, c(5)
      end

*does* lead to better code:

        movaps  1056(%rsp), %xmm0
        movq    %rbp, %rdi
        addps   1008(%rsp), %xmm0
        movq    $.LC2, 488(%rsp)
        movaps  %xmm0, 960(%rsp)
        movl    $9, 496(%rsp)
        movaps  1072(%rsp), %xmm0
        movl    $128, 480(%rsp)
        addps   1024(%rsp), %xmm0
        movl    $6, 484(%rsp)
        movaps  %xmm0, 976(%rsp)
        movss   1088(%rsp), %xmm0
        addss   1040(%rsp), %xmm0
        movss   %xmm0, 992(%rsp)
        movss   1092(%rsp), %xmm0
        addss   1044(%rsp), %xmm0
        movss   %xmm0, 996(%rsp)

i.e., a completely unrolled and (SLP) vectorized code.

So the potential is there - what we just need is an Alignment Propagation Pass (analogous to the Constant and the Range Propagation pass).

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

Reply via email to