https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565
--- Comment #5 from Quanhua Liu <quanhua.liu at noaa dot gov> --- Hi Richard, Using -fexternal-blas for gfortran v10.3.0 is much slower than the method 2: BB = transpose(B) C = matmul(A, BB) How about on your machine? Thanks, Quanhua Liu On 8/9/2022 11:07 AM, kargl at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 > > kargl at gcc dot gnu.org changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |kargl at gcc dot gnu.org > > --- Comment #3 from kargl at gcc dot gnu.org --- > >> INTEGER, PARAMETER :: m = 200, n = 300, nn = 150 >> REAL :: A(m,n), B(nn,n), C(m,nn), BB(n,nn) >> INTEGER :: i, j, k, L > > If you are doing a problem of this size or larger, you want to use the > -fexternal-blas option and link in OpenBLAS. > > I added timing code and replicated the loop to both in one go. > > % gfcx -o z -O3 -march=native a.f90 && ./z > 1.16500998 1615.08594 > 5.32258606 1615.08020 > % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z > 2.44668889 1615.08301 > 1.99379802 1615.08301 >