It seems that gfortran will inline MATMUL with optimization. This produce very poor performance. In fact, gfortran will inline MATMUL even if one specifies -fexternal-blas. This is very bad.
% cat a.f90 program main implicit none integer, parameter :: imax = 20000, jmax = 10000 real, allocatable :: inVect(:), matrix(:,:), outVect(:) real :: start, finish allocate(invect(imax), matrix(imax,jmax), outvect(jmax)) call random_number(inVect) call random_number(matrix) call cpu_time(start) outVect = matmul(inVect, matrix) call cpu_time(finish) print '("Time = ",f10.7," seconds. – First Value = ",f10.4)',finish-start,outVect(1) end program main % gfcx -o z -O0 a.f90 && ./z Time = 0.2234111 seconds. – First Value = 4982.6362 % nm z | grep matmul U _gfortran_matmul_r4@@GFORTRAN_8 % gfcx -o z -O1 a.f90 && ./z Time = 0.3295890 seconds. – First Value = 4971.0962 % nm z | grep matmul % gfcx -o z -O2 a.f90 && ./z Time = 0.3299561 seconds. – First Value = 5025.4902 % nm z | grep matmul % gfcx -o z -O2 -fexternal-blas a.f90 && ./z Time = 0.3295580 seconds. – First Value = 5022.8291 This last one is definitely broken. I did not link with an external BLAS library. Please fix before 11.1 is released. -- Steve