Hi Gray
thank you very much. Yes, memory allocation is astonishing. This works
great.
On Friday, September 5, 2014 6:59:53 AM UTC+2, Gray Calhoun wrote:
>
> On Wednesday, September 3, 2014 6:46:07 PM UTC-5, Giuseppe Ragusa wrote:
>>
>> I have been struggling to find a fast and elegant way to implement the
>> following algorithm.
>>
> [...]
>
>> function block_crossprod!(out, X, cl)
>> for j in distinct(cl)
>> out += X[find(cl.==j),:]'*X[find(cl.==j),:]
>> end
>> end
>>
>
> Hi Giuseppe, hope you're doing well! I think that "+=" creates a
> temporary array, even though the notation suggests that it's
> mutating. The BLAS function "syrk!" does essentially what you
> want but without creating the temporary variables[1]. i.e.:
> ```
> function block_crossprod2!(out, X, cl)
> for j in unique(cl)
> Base.BLAS.syrk!('U', 'T', 1., X[cl .== j,:], 1., out)
> end
> end
> ```
> out will only return the upper triangular part, but
> Symmetric(out) will make it symmetric.
>
> If you know more about the indexing scheme, you may be able to
> improve things more by replacing X[cl .== j,:] with a subarray:
> ```
> function block_crossprod3!(out, X, bstarts, blengths)
> for j in 1:length(bstarts)
> Base.BLAS.syrk!('U', 'T', 1.,
> sub(X, range(bstarts[j], blengths[j]),:), 1., out)
> end
> end
> ```
> Other people may have more improvements, or know
> how to use subarrays in the general case. Hope this helps!
>
> A quick profile:
> ```
> X = randn(10_000,200)
> cl=rep(1:500, 20)
> out1 = zeros(200,200)
> bstarts = 1:20:10_000
> blengths = rep(20, 500)
>
> @time block_crossprod!(out1, X, cl)
> # elapsed time: 0.611584694 seconds (358190568 bytes allocated, 41.52% gc
> time)
>
> @time block_crossprod2!(out1, X, cl)
> # elapsed time: 0.179214852 seconds (19066568 bytes allocated, 20.01% gc
> time)
>
> @time block_crossprod3!(out1, X, bstarts, blengths)
> # elapsed time: 0.095740838 seconds (390416 bytes allocated)
> ```
> The memory allocation for the last version is kind of astonishing.
>
> [1]:
> http://julia.readthedocs.org/en/latest/stdlib/linalg/#Base.LinAlg.BLAS.syrk
> !
>