[
https://issues.apache.org/jira/browse/MAHOUT-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701380#comment-13701380
]
Peng Cheng commented on MAHOUT-1274:
------------------------------------
BTW may I ask (noobishly) that why you have deprecated the SlopeOneRecommender
in the latest core-0.8 snapshot? i must have missed a lot in previous
mahout-development emails before i join so apologies if its a stupid question.
> SGD-based Online SVD recommender
> --------------------------------
>
> Key: MAHOUT-1274
> URL: https://issues.apache.org/jira/browse/MAHOUT-1274
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Reporter: Peng Cheng
> Assignee: Sean Owen
> Labels: collaborative-filtering, features, machine_learning, svd
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> an online SVD recommender is otherwise similar to an offline SVD recommender
> except that, upon receiving one or several new recommendations, it can add
> them into the training dataModel and update the result accordingly in real
> time.
> an online SVD recommender should override setPreference(...) and
> removePreference(...) in AbstractRecommender such that the factorization
> result is updated in O(1) time and without retraining.
> Right now the slopeOneRecommender is the only component possessing such
> capability.
> Since SGD is intrinsically an online algorithm and its CF implementation is
> available in core-0.8 (See MAHOUT-1089, MAHOUT-1272), I presume it would be a
> good time to convert it. Such feature could come in handy for some websites.
> Implementation: Adding new users, items, or increasing rating matrix rank are
> just increasing size of user and item matrices. Reducing rating matrix rank
> involves just one svd. The real challenge here is that sgd is NO ONE-PASS
> algorithm, multiple passes are required to achieve an acceptable optimality
> and even more so if hyperparameters are bad. But here are two possible
> circumvents:
> 1. Use one-pass algorithms like averaged-SGD, not sure if it can ever work as
> applying stochastic convex-opt algorithm to non-convex problem is anarchy.
> But it may be a long shot.
> 2. Run incomplete passes in each online update using ratings randomly sampled
> (but not uniformly sampled) from latest dataModel. I don't know how exactly
> this should be done but new rating should be sampled more frequently. Uniform
> sampling will results in old ratings being used more than new ratings in
> total. If somebody has worked on this batch-to-online conversion before and
> share his insight that would be awesome. This seems to be the most viable
> option, if I get the non-uniform pseudorandom generator that maintains a
> cumulative uniform distribution I want.
> I found a very old ticket (MAHOUT-572) mentioning online SVD recommender but
> it didn't pay off. Hopefully its not a bad idea to submit a new ticket here.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira