I didn't finish the previous mail before hitting "send", so here
is the postscript...

OK, so I've had a bit of time to look at the actual test case.  I
missed one very important detail before:  This is a vector-matrix
operation.

For this, we do not have a good library routine (Harald just
removed it because of a bug in buffering), and -fexternal-blas
does not work because we do not handle calls to anything but
*GEMM.

A vector-matrix multiplicatin would be a call to *GEMV, a worthy
goal, but out of scope so close to a release.

The idea is that, for a vector-matrix-multiplication, the
compiler should have enough information about the information
about how to optimize for the relevant architecture, especially
if the user compilers with the right flags.

So, the current idea is that, if we optimize, we can inline.

What would a better heuristic be?

Best regards

        Thomas

Reply via email to