It seems that gfortran will inline MATMUL with optimization.
This  produce very poor performance.  In fact, gfortran will
inline MATMUL even if one specifies -fexternal-blas.  This is
very bad.

% cat a.f90
program main

   implicit none

   integer, parameter :: imax = 20000, jmax = 10000
   real, allocatable :: inVect(:), matrix(:,:), outVect(:)
   real :: start, finish

   allocate(invect(imax), matrix(imax,jmax), outvect(jmax))

   call random_number(inVect)
   call random_number(matrix)
        
   call cpu_time(start)
   outVect = matmul(inVect, matrix)
   call cpu_time(finish)

   print '("Time = ",f10.7," seconds. – First Value = 
",f10.4)',finish-start,outVect(1)
end program main

% gfcx -o z -O0 a.f90 && ./z
Time =  0.2234111 seconds. – First Value =  4982.6362
% nm z | grep matmul
                 U _gfortran_matmul_r4@@GFORTRAN_8
% gfcx -o z -O1 a.f90 && ./z
Time =  0.3295890 seconds. – First Value =  4971.0962
% nm z | grep matmul
% gfcx -o z -O2 a.f90 && ./z
Time =  0.3299561 seconds. – First Value =  5025.4902
% nm z | grep matmul
% gfcx -o z -O2 -fexternal-blas a.f90 && ./z
Time =  0.3295580 seconds. – First Value =  5022.8291

This last one is definitely broken.  I did not link with
an external BLAS library.  Please fix before 11.1 is 
released.

-- 
Steve

Reply via email to