https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tkoenig at gcc dot gnu.org
             Blocks|                            |36854
           Severity|normal                      |enhancement

--- Comment #4 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
First, a few remarks on the code: It is written in a suboptmial way
regarding the way Fortran lays out its memory.

The code does

        real,   dimension(N,3),     intent(out)     ::  X
        real,   dimension(N,10),    intent(in)      ::  BPP

and then

        do concurrent (i = 1:N)
            X(i,:) = fpdbacksolve(BPP(i,1:3), BPP(i,5:10))
        end do

The problem is that BPP(i,1:3) is not contiguous in memory.

Fortran lays out the memory for that array as

BPP(1,1), BPP(2,1), BPP(3,1), BPP(4,1), ..., BPP(1,2)

so you are accessing your memory with a stride of n in the
expressions BPP(i,1:3) and BPP(i,5:10). This is very inefficient
anyway, vectorization would not really help in this case.

So, if you change your code to

        real,   dimension(3,N),     intent(out)     ::  X
        real,   dimension(10,N),    intent(in)      ::  BPP

        do concurrent (i = 1:N)
            X(i,:) = fpdbacksolve(BPP(1:3,i), BPP(5:10,i))
        end do

then processBPP will be inlined completely, and memory accesses will
be contiguous.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36854
[Bug 36854] [meta-bug] fortran front-end optimization

Reply via email to