[ https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207298#comment-14207298 ]
Debasish Das commented on SPARK-3066: ------------------------------------- [~mengxr] I am testing recommendAllUsers and recommendAllProducts API and I will add the code to RankingMetrics PR: https://github.com/apache/spark/pull/3098 I have not used level-3 BLAS yet since we should be able to re-use DistributedMatrix API that's coming online (here all the matrices are dense)...I used ideas 1 and 2 and I also add a skipRatings in the API (using that you can skip the ratings that each user has already provided...for the validation I skip the train set basically) Example API: def recommendAllUsers(num: Int, skipUserRatings: RDD[Rating]) = { val skipUsers = skipUserRatings.map { x => ((x.user, x.product), x.rating) } val productVectors = productFeatures.collect recommend(productVectors, userFeatures, num, skipUsers) } def recommendAllProducts(num: Int, skipProductRatings: RDD[Rating]) = { val skipProducts = skipProductRatings.map { x => ((x.product, x.user), x.rating) } val userVectors = userFeatures.collect recommend(userVectors, productFeatures, num, skipProducts) } > Support recommendAll in matrix factorization model > -------------------------------------------------- > > Key: SPARK-3066 > URL: https://issues.apache.org/jira/browse/SPARK-3066 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Xiangrui Meng > > ALS returns a matrix factorization model, which we can use to predict ratings > for individual queries as well as small batches. In practice, users may want > to compute top-k recommendations offline for all users. It is very expensive > but a common problem. We can do some optimization like > 1) collect one side (either user or product) and broadcast it as a matrix > 2) use level-3 BLAS to compute inner products > 3) use Utils.takeOrdered to find top-k -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org