Here we go: https://issues.apache.org/jira/browse/SPARK-4823
On Wed, Dec 10, 2014 at 9:01 PM, Debasish Das <debasish.da...@gmail.com> wrote: > I added code to compute topK products for each user and topK user for each > product in SPARK-3066.. > > That is different than row similarity calculation as we need both user and > product factors to calculate the topK recommendations.. > > For (1) and (2) we are trying to answer similarUsers to given a user and > similarProducts to a given product.... > > similarProducts to a given product is straightforward to compute through > columnSimilarities/dimsum when products are skinny... > > similarUser to a given user will need a map-reduce implementation of row > similarity since the matrix is tall... > > I don't see a JIRA for that yet...Are there any good reference for map > reduce implementation of row similarity ? > > On Wed, Dec 10, 2014 at 2:30 PM, Reza Zadeh <r...@databricks.com> wrote: > >> It's not so cheap to compute row similarities when there are many rows, >> as it amounts to computing the outer product of a matrix A (i.e. computing >> AA^T, which is expensive). >> >> There is a JIRA to track handling (1) and (2) more efficiently than >> computing all pairs: https://issues.apache.org/jira/browse/SPARK-3066 >> >> >> >> On Wed, Dec 10, 2014 at 2:44 PM, Debasish Das <debasish.da...@gmail.com> >> wrote: >> >>> Hi, >>> >>> It seems there are multiple places where we would like to compute row >>> similarity (accurate or approximate similarities) >>> >>> Basically through RowMatrix columnSimilarities we can compute column >>> similarities of a tall skinny matrix >>> >>> Similarly we should have an API in RowMatrix called rowSimilarities where >>> we can compute similar rows in a map-reduce fashion. It will be useful >>> for >>> following use-cases: >>> >>> 1. Generate topK users for each user from matrix factorization model >>> 2. Generate topK products for each product from matrix factorization >>> model >>> 3. Generate kernel matrix for use in spectral clustering >>> 4. Generate kernel matrix for use in kernel regression/classification >>> >>> I am not sure if there are already good implementation for map-reduce row >>> similarity that we can use (ideas like fastfood and kitchen sink felt >>> more >>> like for classification use-case but for recommendation also user >>> similarities show up which is unsupervised)... >>> >>> Is there a JIRA tracking it ? If not I can open one and we can discuss >>> further on it. >>> >>> Thanks. >>> Deb >>> >> >> >