Thanks, this is what I currently do :)

However, I'd like to find a solution that is both memory efficient (X can 
be very large) and which does not modify X in place.

Basically, I'm wondering whether there was a BLAS subroutine that would 
allow to compute cross(X, w, Y) in one pass without creating an 
intermediate matrix as large as X or Y.

Reply via email to