https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #22 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Here are the details of how I tested this.

I generated the in_pack_r4.i and in_unpack_r4.i by adding -save-temps to the
Makefile options in ~/trunk-bin/powerpc64le-unknown-linux-gnu/libgfortran ,
then removing in_pack_r4.* and in_unpack_r4.* there and running make.

In the benchmark directory, I then used

bench.f90:

program main
  real, dimension(:,:), allocatable :: a
  allocate (a(50000,4))
  call random_number (a)
  do i=1,5000000
     call foo(a(i::2,:))
     call foo(a)
  end do
end program main

foo.f90:

subroutine foo(a)
  real, dimension(*) :: a
end subroutine foo

(constants can be adjusted).  The first call to foo needs a repacking,
the second one is just to confuse the optimizer not to exit the loop.

With the command line

gfortran -g -fno-inline-arg-packing  -O2 bench.f90 foo.f90  in_pack_r4.i
in_unpack_r4.i -static-libgfortran && time ./a.out

a test can be run. -fno-inline-arg-repacking is important because
otherwise the internal packing routines will not be called, and
putting in in_pack_r4.i and in_unpack_r4.i will use those instead
of the ones from the (static) library.

in_pack_r4.i and in_unpack_r4.i can then be adjusted, for
exmaple by adding a #pragma GCC unroll 1.

Reply via email to