[ https://issues.apache.org/jira/browse/SPARK-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481711#comment-14481711 ]
Xiangrui Meng commented on SPARK-6407: -------------------------------------- Attached the comment from Chunnan Yao in SPARK-6711: On-line Collaborative Filtering(CF) has been widely used and studied. To re-train a CF model from scratch every time when new data comes in is very inefficient (http://stackoverflow.com/questions/27734329/apache-spark-incremental-training-of-als-model). However, in Spark community we see few discussion about collaborative filtering on streaming data. Given streaming k-means, streaming logistic regression, and the on-going incremental model training of Naive Bayes Classifier (SPARK-4144), we think it is meaningful to consider streaming Collaborative Filtering support on MLlib. We have already been considering about this issue during the past week. We plan to refer to this paper (https://www.cs.utexas.edu/~cjohnson/ParallelCollabFilt.pdf). It is based on SGD instead of ALS, which is easier to be tackled under streaming data. Fortunately, the authors of this paper have implemented their algorithm as a Github Project, based on Storm: https://github.com/MrChrisJohnson/CollabStream > Streaming ALS for Collaborative Filtering > ----------------------------------------- > > Key: SPARK-6407 > URL: https://issues.apache.org/jira/browse/SPARK-6407 > Project: Spark > Issue Type: New Feature > Components: Streaming > Reporter: Felix Cheung > Priority: Minor > > Like MLLib's ALS implementation for recommendation, and applying to streaming. > Similar to streaming linear regression, logistic regression, could we apply > gradient updates to batches of data and reuse existing MLLib implementation? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org