Le mardi 11 novembre 2014 à 18:15 -0800, Tony Kelman a écrit : > Milan, > > > A useful trick for linking to lines of code on github is to hit the > "y" key, and github will reload the page you're on but with the > specific sha in the url - like so > https://github.com/JuliaLang/julia/blob/d0a951ccb3a7ebae7909665f4445a019f2ee54a1/base/sparse/sparsematrix.jl#L530 > > > That way the link will still make sense in the future even if the > contents of that file get moved around. Right, good trick.
Regards > On Tuesday, November 11, 2014 2:57:44 AM UTC-8, Milan Bouchet-Valat > wrote: > Le lundi 10 novembre 2014 à 19:06 -0800, Todd Leo a écrit : > > I do, actually, tried expanding vectorized operations into > > explicit for loops, and computing vector multiplication / > > vector norm in BLAS interfaces. For explicit loops, it did > > allocate less memory, but took much more time. Meanwhile, > > the vectorized version which I've been get used to write > > runs incredibly fast, as the following tests indicates: > I don't have a typical matrix like yours to test it, but your > cosine() function is adapted from an algorithm for dense > matrices. If your matrix is really sparse, you can do the > computations only for the nonzero entries when multiplying i > by itself and j by itself. I've put together a short function > to compute the sum of squares here: > https://gist.github.com/nalimilan/3ab9922294663b2ec979 > > BTW, looks like something like that is needed in Base to > specialize sumabs2() for sparse matrices, as the current > generic version is terribly slow in that case. > > It could easily be extended to compute i .* j if the nonzeros > entries in both matrices match. If not, more work is needed, > but you could adapt the .* function from [1] to compute the > sum on the fly instead of allocating space for a new matrix. > > > Regards > > 1: > > https://github.com/JuliaLang/julia/blob/master/base/sparse/sparsematrix.jl#L530 > > > # Explicit for loop, slightly modified from > > SimilarityMetric.jl by johnmyleswhite > > > (https://github.com/johnmyleswhite/SimilarityMetrics.jl/blob/master/src/cosine.jl) > > function cosine(a::SparseMatrixCSC{Float64, Int64}, > > b::SparseMatrixCSC{Float64, Int64}) > > sA, sB, sI = 0.0, 0.0, 0.0 > > for i in 1:length(a) > > sA += a[i]^2 > > sI += a[i] * b[i] > > end > > for i in 1:length(b) > > sB += b[i]^2 > > end > > return sI / sqrt(sA * sB) > > end > > > > # BLAS version > > function cosine_blas(i::SparseMatrixCSC{Float64, Int64}, > > j::SparseMatrixCSC{Float64, Int64}) > > i = full(i) > > j = full(j) > > numerator = BLAS.dot(i, j) > > denominator = BLAS.nrm2(i) * BLAS.nrm2(j) > > return numerator / denominator > > end > > > > # the vectorized version remains the same, as the 1st post > > shows. > > > > # Test functions > > function test_explicit_loop(d) > > for n in 1:10000 > > v = d[:,1] > > cosine(v,v) > > end > > end > > > > function test_blas(d) > > for n in 1:10000 > > v = d[:,1] > > cosine_blas(v,v) > > end > > end > > > > function test_vectorized(d) > > for n in 1:10000 > > v = d[:,1] > > cosine_vectorized(v,v) > > end > > end > > > > > > test_explicit_loop(mat) > > test_blas(mat) > > test_vectorized(mat) > > gc() > > @time test_explicit_loop(mat) > > gc() > > @time test_blas(mat) > > gc() > > @time test_vectorized(mat) > > > > # Results > > elapsed time: 3.772606858 seconds (6240080 bytes allocated) > > elapsed time: 0.400972089 seconds (327520080 bytes > > allocated, 81.58% gc time) > > elapsed time: 0.011236068 seconds (34560080 bytes allocated) > > > > > > > > > > On Monday, November 10, 2014 7:23:17 PM UTC+8, Milan > > Bouchet-Valat wrote: > > Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo > > a écrit : > > > Hi fellows, > > > > > > > > > I'm currently working on sparse matrix and cosine > > > similarity computation, but my routines is running > > > very slow, at least not reach my expectation. So I > > > wrote some test functions, to dig out the reason > > > of ineffectiveness. To my surprise, the execution > > > time of passing two vectors to the test function > > > and passing the whole sparse matrix differs > > > greatly, the latter is 80x faster. I am wondering > > > why extracting two vectors of the matrix in each > > > loop is dramatically faster that much, and how to > > > avoid the multi-GB memory allocate. Thanks guys. > > > > > > > > > -- > > > BEST REGARDS, > > > Todd Leo > > > > > > > > > # The sparse matrix > > > mat # 2000x15037 SparseMatrixCSC{Float64, Int64} > > > > > > > > > # The two vectors, prepared in advance > > > v = mat'[:,1] > > > w = mat'[:,2] > > > > > > > > > # Cosine similarity function > > > function > > > cosine_vectorized(i::SparseMatrixCSC{Float64, > > > Int64}, j::SparseMatrixCSC{Float64, Int64}) > > > return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j)) > > > end > > I think you'll experience a dramatic speed gain if > > you write the sums in explicit loops, accessing > > elements one by one, taking their product and adding > > it immediately to a counter. In your current > > version, the element-wise products allocate new > > vectors before computing the sums, which is very > > costly. > > > > This will also get rid of the difference you report > > between passing arrays and vectors. > > > > > > Regards > > > > > function test1(d) > > > res = 0. > > > for i in 1:10000 > > > res = cosine_vectorized(d[:,1], d[:,2]) > > > end > > > end > > > > > > > > > function test2(_v,_w) > > > res = 0. > > > for i in 1:10000 > > > res = cosine_vectorized(_v, _w) > > > end > > > end > > > > > > > > > test1(dtm) > > > test2(v,w) > > > gc() > > > @time test1(dtm) > > > gc() > > > @time test2(v,w) > > > > > > > > > #elapsed time: 0.054925372 seconds (59360080 bytes > > > allocated, 59.07% gc time) > > > > > > #elapsed time: 4.204132608 seconds (3684160080 > > > bytes allocated, 65.51% gc time) > > > > > > > > >
