Andreas, do you know offhand which matrix multiplication algorithm OpenBLAS routine uses?
On Wednesday, July 8, 2015 at 11:37:51 AM UTC-4, Andreas Noack wrote: > > It can be quite large. With > > julia> function mymul(A,B) > m, n = size(A, 1), size(B, 2) > C = promote_type(typeof(A), typeof(B))(m,n) > for j = 1:n > for i = 1:m > tmp = zero(eltype(C)); for k = 1:size(A, 2) > tmp += A[i,k]*B[k,j] > end > C[i,j] = tmp > end > end > return C > end > > I get that single threaded OpenBLAS speed-up of > > size factor > 2 1.16176 > 4 0.515929 > 8 1.73846 > 16 4.80873 > 32 10.4425 > 64 11.6411 > 128 20.1504 > 256 41.6211 > 512 38.4489 > 1024 136.855 > > 2015-07-08 10:46 GMT-04:00 Josh Langsfeld <[email protected] <javascript:>> > : > >> Ah, thanks, that's good to know. I was under the mistaken impression that >> loops are always the fastest option in Julia since it's brought up pretty >> frequently. Out of curiosity, what factor of slow-down would not using the >> optimized routines cause? >> >> On Wed, Jul 8, 2015 at 10:39 AM, Andreas Noack <[email protected] >> <javascript:>> wrote: >> >>> You could, but unless the matrices are small, it would be slower because >>> it wouldn't use optimized matrix multiplication. >>> >>> 2015-07-08 10:36 GMT-04:00 Josh Langsfeld <[email protected] >>> <javascript:>>: >>> >>>> Maybe I'm missing something obvious, but couldn't you easily write your >>>> own 'cross' function that uses a couple nested for-loops to do the >>>> arithmetic without any intermediate allocations at all? >>>> >>>> On Tuesday, July 7, 2015 at 6:24:34 PM UTC-4, Matthieu wrote: >>>>> >>>>> Thanks, this is what I currently do :) >>>>> >>>>> However, I'd like to find a solution that is both memory efficient (X >>>>> can be very large) and which does not modify X in place. >>>>> >>>>> Basically, I'm wondering whether there was a BLAS subroutine that >>>>> would allow to compute cross(X, w, Y) in one pass without creating an >>>>> intermediate matrix as large as X or Y. >>>>> >>>>> >>> >> >
