Are there docs on RowSimilarity?  Also, has anyone tried it at scale?  I'm 
seeing some long running times for a matrix that I don't think is huge (still 
waiting to hear from colleague about actual size)  What does the distributed 
vector similarity get us over just using our existing distance measures?

Also, would there be interest in a job that is basically the map side of 
K-Means and simply outputs the distance between some vector and a list of 
vectors where the seed vectors fit in memory? It's similar to RowSimilarity, 
but it doesn't bother with the co-ocurrence calculation.


-Grant



Reply via email to