On Thu, Mar 18, 2021 at 07:24:21PM +0100, Thomas Koenig wrote:
> I didn't finish the previous mail before hitting "send", so here
> is the postscript...
> 
> > OK, so I've had a bit of time to look at the actual test case.  I
> > missed one very important detail before:  This is a vector-matrix
> > operation.
> > 
> > For this, we do not have a good library routine (Harald just
> > removed it because of a bug in buffering), and -fexternal-blas
> > does not work because we do not handle calls to anything but
> > *GEMM.
> 
> A vector-matrix multiplicatin would be a call to *GEMV, a worthy
> goal, but out of scope so close to a release.

Agreed.

> > The idea is that, for a vector-matrix-multiplication, the
> > compiler should have enough information about the information
> about how to optimize for the relevant architecture, especially
> if the user compilers with the right flags.
> 
> So, the current idea is that, if we optimize, we can inline.
> 
> What would a better heuristic be?
> 

Does _gfortran_matmul_r4 (and friends) work for vector-matrix
products?  I haven't checked.  If so, how about disabling
in-lining MATMUL for 11.1; then, for 11.2, this can be revisited
where a small N can be chosen for in-lining.  With -fexternal-blas
and *gemm, the default cross-over is N = 30.

BTW, I cam across this in StackOverflow.

https://stackoverflow.com/questions/66682180/why-is-matmul-slower-with-gfortran-compiler-optimization-turned-on

-- 
Steve

Reply via email to