[
https://issues.apache.org/jira/browse/SPARK-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895589#comment-15895589
]
Daniel Li commented on SPARK-6407:
----------------------------------
Reviving this thread since I'm interested in implementing streaming CF for
Spark.
bq. Using ALS for online updates is expensive.
Recomputing the factor matrices _U_ and _V_ from scratch for every update would
be terribly expensive, but what about keeping _U_ and _V_ around and simply
recomputing another round or two after each new rating that comes in? The
algorithm would simply be continually following a moving optimum. I can't
imagine the RMSE changing much due to small updates if we use a convergence
threshold _à la_ [Y. Zhou, et al., “Large-Scale Parallel Collaborative
Filtering for the Netflix Prize”|http://dl.acm.org/citation.cfm?id=1424269]
instead of a fixed number of iterations.
(In fact, since calculating _(U^T) * V_ would probably take a nontrivial slice
of time, new updates that come in during a round of calculation could be
"batched" into the next round of calculation, increasing efficiency.)
Thoughts?
> Streaming ALS for Collaborative Filtering
> -----------------------------------------
>
> Key: SPARK-6407
> URL: https://issues.apache.org/jira/browse/SPARK-6407
> Project: Spark
> Issue Type: New Feature
> Components: DStreams
> Reporter: Felix Cheung
> Priority: Minor
>
> Like MLLib's ALS implementation for recommendation, and applying to streaming.
> Similar to streaming linear regression, logistic regression, could we apply
> gradient updates to batches of data and reuse existing MLLib implementation?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]