K-NN by efficient sparse matrix product

2014-05-28 Thread Christian Jauvin
Hi, I'm new to Spark and Hadoop, and I'd like to know if the following problem is solvable in terms of Spark's primitives. To compute the K-nearest neighbours of a N-dimensional dataset, I can multiply my very large normalized sparse matrix by its transpose. As this yields all pairwise distance

Re: K-NN by efficient sparse matrix product

2014-05-28 Thread Christian Jauvin
max(nnz(L)*log p, nnz(L)*n/p). I have to warn though: when I played with matrix multiplication, I was getting nowhere near serial performance. On Wed, May 28, 2014 at 11:00 AM, Christian Jauvin cjau...@gmail.com wrote: Hi, I'm new to Spark and Hadoop, and I'd like to know if the following