DOT_PRODUCT

tkoenig at gcc dot gnu.org Thu, 09 Mar 2017 13:17:14 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930


--- Comment #11 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
A couple of points:

First, the slow random number generation.  While I do not
understand why using the loop the way you do makes things
slower with optimization, it is _much_ faster to generate
random numbers in large chunks, as in

    call random_number(NU)
    call random_number(NV)

Second, the optimization.  With current trunk, you have
to add statements to make sure that the optimizers do
not notice you don't actually use your results :-)

I added

    s_total = 0.0_dp

...

    do i = 1, i_max
      tp = TP_SUM(NU(:,i), P(1:4,1:4), NV(:,i))
      s_total = s_total + sum(tp%vec)
    end do

...

    print *,s_total

to the test cases so that the tests don't suddenly use zero
CPU seconds.

Third, you really have to look to what you are doing
with your specific test cases, together with LTO and
data analysis.

Looking at your test case, your Tensor P is always the same.
I don't know if this is representative of your problem or not.
It has a huge effect on speed, because your routines are
completely inlined (and unrolled) with -flto -Ofast.
Not having to reload the data for P makes things much faster.

Compare:

ig25@linux-d6cw:~/Krempel/Tensor> gfortran -march=native -Ofast -fno-inline
tp_array_2.f90 
ig25@linux-d6cw:~/Krempel/Tensor> ./a.out
 This code variant uses intrinsic arrays to represent the contents of
Type(Vect3D).
 Random Numbers, time:     1.41199994    
 Using SUM, time:         0.888000011    
 Using MATMUL (L), time:  0.812000036    
 Using MATMUL (R), time:  0.895999908    
   2415021069.9784665     
ig25@linux-d6cw:~/Krempel/Tensor> gfortran -march=native -Ofast -flto
tp_array_2.f90 
ig25@linux-d6cw:~/Krempel/Tensor> ./a.out
 This code variant uses intrinsic arrays to represent the contents of
Type(Vect3D).
 Random Numbers, time:     1.41199994    
 Using SUM, time:         0.747999907    
 Using MATMUL (L), time:  0.132000208    
 Using MATMUL (R), time:  0.135999918

[Bug fortran/79930] Potentially Missed Optimisation for MATMUL / DOT_PRODUCT

Reply via email to