[ https://issues.apache.org/jira/browse/SPARK-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng closed SPARK-6711. -------------------------------- Resolution: Duplicate > Support parallelized online matrix factorization for Collaborative Filtering > ----------------------------------------------------------------------------- > > Key: SPARK-6711 > URL: https://issues.apache.org/jira/browse/SPARK-6711 > Project: Spark > Issue Type: Improvement > Components: MLlib, Streaming > Reporter: Chunnan Yao > Original Estimate: 840h > Remaining Estimate: 840h > > On-line Collaborative Filtering(CF) has been widely used and studied. To > re-train a CF model from scratch every time when new data comes in is very > inefficient > (http://stackoverflow.com/questions/27734329/apache-spark-incremental-training-of-als-model). > However, in Spark community we see few discussion about collaborative > filtering on streaming data. Given streaming k-means, streaming logistic > regression, and the on-going incremental model training of Naive Bayes > Classifier (SPARK-4144), we think it is meaningful to consider streaming > Collaborative Filtering support on MLlib. > We have already been considering about this issue during the past week. We > plan to refer to this paper > (https://www.cs.utexas.edu/~cjohnson/ParallelCollabFilt.pdf). It is based on > SGD instead of ALS, which is easier to be tackled under streaming data. > Fortunately, the authors of this paper have implemented their algorithm as a > Github Project, based on Storm: > https://github.com/MrChrisJohnson/CollabStream -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org