https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930

--- Comment #10 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #9)
> If dot_product (matmul (...), ..) can be implemented more optimally (is
> there a blas/lapack primitive for it?) then the best course of action is to
> pattern
> match that inside the frontend and emit a library call to an optimized
> routine
> (which means eventually adding one to libfortran or using/extending
> -fexternal-blas.

Experience from inlining matmul shows that library routines have
a very hard time beating an inline version for small problem sizes.
This is why we currently implement inline matmul up to a matrix
size of 30.

This example, with 4*4 matrices / vectors, is a prime candidate
for inlining.

Reply via email to