[ 
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352416#comment-14352416
 ] 

Joseph K. Bradley commented on SPARK-3066:
------------------------------------------

It's similar, I believe, for ALS.  The cosine similarity metric you get with 
the dot product for ALS is a distance metric, right?  So finding the top K 
products to recommend a given user is essentially the same as finding the K 
product feature vectors which are closest to the user's feature vector.  This 
optimization could be used both for recommending for a single user and for 
recommendAll.

I'm not sure about how effective these approximate nearest neighbor methods 
are.  My understanding is that they work reasonable well as long as the feature 
space is fairly low-dimensional, which should often be the case for ALS.

My hope is that these approximate nearest neighbor data structures can reduce 
communication.  The ones I've seen are based on feature space partitioning, 
which could potentially allow you to figure out a subset of product partitions 
to check for each user.

Using level 3 BLAS might be better; I'm really not sure.  It won't reduce 
communication, though.  These 2 types of optimizations might be orthogonal, 
anyways.

> Support recommendAll in matrix factorization model
> --------------------------------------------------
>
>                 Key: SPARK-3066
>                 URL: https://issues.apache.org/jira/browse/SPARK-3066
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Debasish Das
>
> ALS returns a matrix factorization model, which we can use to predict ratings 
> for individual queries as well as small batches. In practice, users may want 
> to compute top-k recommendations offline for all users. It is very expensive 
> but a common problem. We can do some optimization like
> 1) collect one side (either user or product) and broadcast it as a matrix
> 2) use level-3 BLAS to compute inner products
> 3) use Utils.takeOrdered to find top-k



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to