https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #10 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- (In reply to Richard Biener from comment #9) > If dot_product (matmul (...), ..) can be implemented more optimally (is > there a blas/lapack primitive for it?) then the best course of action is to > pattern > match that inside the frontend and emit a library call to an optimized > routine > (which means eventually adding one to libfortran or using/extending > -fexternal-blas. Experience from inlining matmul shows that library routines have a very hard time beating an inline version for small problem sizes. This is why we currently implement inline matmul up to a matrix size of 30. This example, with 4*4 matrices / vectors, is a prime candidate for inlining.