https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
Thomas Koenig <tkoenig at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tkoenig at gcc dot gnu.org Blocks| |36854 Severity|normal |enhancement --- Comment #4 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- First, a few remarks on the code: It is written in a suboptmial way regarding the way Fortran lays out its memory. The code does real, dimension(N,3), intent(out) :: X real, dimension(N,10), intent(in) :: BPP and then do concurrent (i = 1:N) X(i,:) = fpdbacksolve(BPP(i,1:3), BPP(i,5:10)) end do The problem is that BPP(i,1:3) is not contiguous in memory. Fortran lays out the memory for that array as BPP(1,1), BPP(2,1), BPP(3,1), BPP(4,1), ..., BPP(1,2) so you are accessing your memory with a stride of n in the expressions BPP(i,1:3) and BPP(i,5:10). This is very inefficient anyway, vectorization would not really help in this case. So, if you change your code to real, dimension(3,N), intent(out) :: X real, dimension(10,N), intent(in) :: BPP do concurrent (i = 1:N) X(i,:) = fpdbacksolve(BPP(1:3,i), BPP(5:10,i)) end do then processBPP will be inlined completely, and memory accesses will be contiguous. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36854 [Bug 36854] [meta-bug] fortran front-end optimization