https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #22 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- Here are the details of how I tested this. I generated the in_pack_r4.i and in_unpack_r4.i by adding -save-temps to the Makefile options in ~/trunk-bin/powerpc64le-unknown-linux-gnu/libgfortran , then removing in_pack_r4.* and in_unpack_r4.* there and running make. In the benchmark directory, I then used bench.f90: program main real, dimension(:,:), allocatable :: a allocate (a(50000,4)) call random_number (a) do i=1,5000000 call foo(a(i::2,:)) call foo(a) end do end program main foo.f90: subroutine foo(a) real, dimension(*) :: a end subroutine foo (constants can be adjusted). The first call to foo needs a repacking, the second one is just to confuse the optimizer not to exit the loop. With the command line gfortran -g -fno-inline-arg-packing -O2 bench.f90 foo.f90 in_pack_r4.i in_unpack_r4.i -static-libgfortran && time ./a.out a test can be run. -fno-inline-arg-repacking is important because otherwise the internal packing routines will not be called, and putting in in_pack_r4.i and in_unpack_r4.i will use those instead of the ones from the (static) library. in_pack_r4.i and in_unpack_r4.i can then be adjusted, for exmaple by adding a #pragma GCC unroll 1.