Yes, I should have thought about that, very straight forward idea. Just
utilize rowvals() respectively and take the intersect set to find the dense
part in both vectors:
function dot_sparse(v::SparseMatrixCSC{Float64,
Int64},w::SparseMatrixCSC{Float64, Int64})
non_0_idx = intersect(rowvals(w), rowvals(v))
_sum = 0.
for i in non_0_idx
_sum += v[i] * w[i]
end
_sum
end
On Wednesday, November 12, 2014 3:41:03 PM UTC+8, Milan Bouchet-Valat wrote:
>
> Le mardi 11 novembre 2014 à 19:29 -0800, Todd Leo a écrit :
> > Thanks, now the 2-norm calculation for sparse matrices of cosine()
> > can be optimized by your sumsq() function. But is dot product part of
> > cosine() for sparse matrices optimizable?
> I think so. Since the product of a zero and a nonzero element is, well,
> zero, you just need to iterate over nonzero elements of one matrix and
> multiply them with the corresponding element of the other matrix (if the
> latter is zero you don't care). Not sure how to do this exactly off the
> top of my head, but shouldn't be too hard.
>
>
> Regards
>
> > On Tuesday, November 11, 2014 6:57:44 PM UTC+8, Milan Bouchet-Valat
> > wrote:
> > Le lundi 10 novembre 2014 à 19:06 -0800, Todd Leo a écrit :
> > > I do, actually, tried expanding vectorized operations into
> > > explicit for loops, and computing vector multiplication /
> > > vector norm in BLAS interfaces. For explicit loops, it did
> > > allocate less memory, but took much more time. Meanwhile,
> > > the vectorized version which I've been get used to write
> > > runs incredibly fast, as the following tests indicates:
> > I don't have a typical matrix like yours to test it, but your
> > cosine() function is adapted from an algorithm for dense
> > matrices. If your matrix is really sparse, you can do the
> > computations only for the nonzero entries when multiplying i
> > by itself and j by itself. I've put together a short function
> > to compute the sum of squares here:
> > https://gist.github.com/nalimilan/3ab9922294663b2ec979
> >
> > BTW, looks like something like that is needed in Base to
> > specialize sumabs2() for sparse matrices, as the current
> > generic version is terribly slow in that case.
> >
> > It could easily be extended to compute i .* j if the nonzeros
> > entries in both matrices match. If not, more work is needed,
> > but you could adapt the .* function from [1] to compute the
> > sum on the fly instead of allocating space for a new matrix.
> >
> >
> > Regards
> >
> > 1:
> >
> https://github.com/JuliaLang/julia/blob/master/base/sparse/sparsematrix.jl#L530
>
> > > # Explicit for loop, slightly modified from
> > > SimilarityMetric.jl by johnmyleswhite
> > > (
> https://github.com/johnmyleswhite/SimilarityMetrics.jl/blob/master/src/cosine.jl)
>
>
> > > function cosine(a::SparseMatrixCSC{Float64, Int64},
> > > b::SparseMatrixCSC{Float64, Int64})
> > > sA, sB, sI = 0.0, 0.0, 0.0
> > > for i in 1:length(a)
> > > sA += a[i]^2
> > > sI += a[i] * b[i]
> > > end
> > > for i in 1:length(b)
> > > sB += b[i]^2
> > > end
> > > return sI / sqrt(sA * sB)
> > > end
> > >
> > > # BLAS version
> > > function cosine_blas(i::SparseMatrixCSC{Float64, Int64},
> > > j::SparseMatrixCSC{Float64, Int64})
> > > i = full(i)
> > > j = full(j)
> > > numerator = BLAS.dot(i, j)
> > > denominator = BLAS.nrm2(i) * BLAS.nrm2(j)
> > > return numerator / denominator
> > > end
> > >
> > > # the vectorized version remains the same, as the 1st post
> > > shows.
> > >
> > > # Test functions
> > > function test_explicit_loop(d)
> > > for n in 1:10000
> > > v = d[:,1]
> > > cosine(v,v)
> > > end
> > > end
> > >
> > > function test_blas(d)
> > > for n in 1:10000
> > > v = d[:,1]
> > > cosine_blas(v,v)
> > > end
> > > end
> > >
> > > function test_vectorized(d)
> > > for n in 1:10000
> > > v = d[:,1]
> > > cosine_vectorized(v,v)
> > > end
> > > end
> > >
> > >
> > > test_explicit_loop(mat)
> > > test_blas(mat)
> > > test_vectorized(mat)
> > > gc()
> > > @time test_explicit_loop(mat)
> > > gc()
> > > @time test_blas(mat)
> > > gc()
> > > @time test_vectorized(mat)
> > >
> > > # Results
> > > elapsed time: 3.772606858 seconds (6240080 bytes allocated)
> > > elapsed time: 0.400972089 seconds (327520080 bytes
> > > allocated, 81.58% gc time)
> > > elapsed time: 0.011236068 seconds (34560080 bytes allocated)
> > >
> > >
> > >
> > >
> > > On Monday, November 10, 2014 7:23:17 PM UTC+8, Milan
> > > Bouchet-Valat wrote:
> > > Le dimanche 09 novembre 2014 à 21:17 -0800, Todd Leo
> > > a écrit :
> > > > Hi fellows,
> > > >
> > > >
> > > > I'm currently working on sparse matrix and cosine
> > > > similarity computation, but my routines is running
> > > > very slow, at least not reach my expectation. So I
> > > > wrote some test functions, to dig out the reason
> > > > of ineffectiveness. To my surprise, the execution
> > > > time of passing two vectors to the test function
> > > > and passing the whole sparse matrix differs
> > > > greatly, the latter is 80x faster. I am wondering
> > > > why extracting two vectors of the matrix in each
> > > > loop is dramatically faster that much, and how to
> > > > avoid the multi-GB memory allocate. Thanks guys.
> > > >
> > > >
> > > > --
> > > > BEST REGARDS,
> > > > Todd Leo
> > > >
> > > >
> > > > # The sparse matrix
> > > > mat # 2000x15037 SparseMatrixCSC{Float64, Int64}
> > > >
> > > >
> > > > # The two vectors, prepared in advance
> > > > v = mat'[:,1]
> > > > w = mat'[:,2]
> > > >
> > > >
> > > > # Cosine similarity function
> > > > function
> > > > cosine_vectorized(i::SparseMatrixCSC{Float64,
> > > > Int64}, j::SparseMatrixCSC{Float64, Int64})
> > > > return sum(i .* j)/sqrt(sum(i.*i)*sum(j.*j))
> > > > end
> > > I think you'll experience a dramatic speed gain if
> > > you write the sums in explicit loops, accessing
> > > elements one by one, taking their product and adding
> > > it immediately to a counter. In your current
> > > version, the element-wise products allocate new
> > > vectors before computing the sums, which is very
> > > costly.
> > >
> > > This will also get rid of the difference you report
> > > between passing arrays and vectors.
> > >
> > >
> > > Regards
> > >
> > > > function test1(d)
> > > > res = 0.
> > > > for i in 1:10000
> > > > res = cosine_vectorized(d[:,1], d[:,2])
> > > > end
> > > > end
> > > >
> > > >
> > > > function test2(_v,_w)
> > > > res = 0.
> > > > for i in 1:10000
> > > > res = cosine_vectorized(_v, _w)
> > > > end
> > > > end
> > > >
> > > >
> > > > test1(dtm)
> > > > test2(v,w)
> > > > gc()
> > > > @time test1(dtm)
> > > > gc()
> > > > @time test2(v,w)
> > > >
> > > >
> > > > #elapsed time: 0.054925372 seconds (59360080 bytes
> > > > allocated, 59.07% gc time)
> > > >
> > > > #elapsed time: 4.204132608 seconds (3684160080
> > > > bytes allocated, 65.51% gc time)
> > > >
> > >
> > >
> >
> >
>
>