Andreas, do you know offhand which matrix multiplication algorithm OpenBLAS 
routine uses?

On Wednesday, July 8, 2015 at 11:37:51 AM UTC-4, Andreas Noack wrote:
>
> It can be quite large. With
>
> julia> function mymul(A,B)
>        m, n = size(A, 1), size(B, 2)
>        C = promote_type(typeof(A), typeof(B))(m,n)
>        for j = 1:n
>        for i = 1:m
>        tmp = zero(eltype(C)); for k = 1:size(A, 2)
>        tmp += A[i,k]*B[k,j]
>        end
>        C[i,j] = tmp
>        end
>        end
>        return C
>        end
>
> I get that single threaded OpenBLAS speed-up of
>
> size factor
> 2         1.16176
> 4         0.515929
> 8         1.73846
> 16       4.80873
> 32       10.4425 
> 64       11.6411 
> 128     20.1504 
> 256     41.6211 
> 512     38.4489 
> 1024 136.855  
>
> 2015-07-08 10:46 GMT-04:00 Josh Langsfeld <[email protected] <javascript:>>
> :
>
>> Ah, thanks, that's good to know. I was under the mistaken impression that 
>> loops are always the fastest option in Julia since it's brought up pretty 
>> frequently. Out of curiosity, what factor of slow-down would not using the 
>> optimized routines cause?
>>
>> On Wed, Jul 8, 2015 at 10:39 AM, Andreas Noack <[email protected] 
>> <javascript:>> wrote:
>>
>>> You could, but unless the matrices are small, it would be slower because 
>>> it wouldn't use optimized matrix multiplication.
>>>
>>> 2015-07-08 10:36 GMT-04:00 Josh Langsfeld <[email protected] 
>>> <javascript:>>:
>>>
>>>> Maybe I'm missing something obvious, but couldn't you easily write your 
>>>> own 'cross' function that uses a couple nested for-loops to do the 
>>>> arithmetic without any intermediate allocations at all?
>>>>
>>>> On Tuesday, July 7, 2015 at 6:24:34 PM UTC-4, Matthieu wrote:
>>>>>
>>>>> Thanks, this is what I currently do :)
>>>>>
>>>>> However, I'd like to find a solution that is both memory efficient (X 
>>>>> can be very large) and which does not modify X in place.
>>>>>
>>>>> Basically, I'm wondering whether there was a BLAS subroutine that 
>>>>> would allow to compute cross(X, w, Y) in one pass without creating an 
>>>>> intermediate matrix as large as X or Y.
>>>>>
>>>>>
>>>
>>
>

Reply via email to