No such BLAS routine exists, but for larger matrices the calculation will be dominated by the final matrix-matrix product anyway.
Den tirsdag den 7. juli 2015 kl. 18.24.34 UTC-4 skrev Matthieu: > > Thanks, this is what I currently do :) > > However, I'd like to find a solution that is both memory efficient (X can > be very large) and which does not modify X in place. > > Basically, I'm wondering whether there was a BLAS subroutine that would > allow to compute cross(X, w, Y) in one pass without creating an > intermediate matrix as large as X or Y. > >
