[
https://issues.apache.org/jira/browse/MAHOUT-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suneel Marthi resolved MAHOUT-1274.
-----------------------------------
Resolution: Won't Fix
Fix Version/s: 1.0
Assignee: Suneel Marthi
no activity for > 6 months, resolving this as 'Won't Fix'.
> SGD-based Online SVD recommender
> --------------------------------
>
> Key: MAHOUT-1274
> URL: https://issues.apache.org/jira/browse/MAHOUT-1274
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Reporter: Peng Cheng
> Assignee: Suneel Marthi
> Labels: collaborative-filtering, features, machine_learning, svd
> Fix For: 1.0
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> an online SVD recommender is otherwise similar to an offline SVD recommender
> except that, upon receiving one or several new recommendations, it can add
> them into the training dataModel and update the result accordingly in real
> time.
> an online SVD recommender should override setPreference(...) and
> removePreference(...) in AbstractRecommender such that the factorization
> result is updated in O(1) time and without retraining.
> Right now the slopeOneRecommender is the only component possessing such
> capability.
> Since SGD is intrinsically an online algorithm and its CF implementation is
> available in core-0.8 (See MAHOUT-1089, MAHOUT-1272), I presume it would be a
> good time to convert it. Such feature could come in handy for some websites.
> Implementation: Adding new users, items, or increasing rating matrix rank are
> just increasing size of user and item matrices. Reducing rating matrix rank
> involves just one svd. The real challenge here is that sgd is NO ONE-PASS
> algorithm, multiple passes are required to achieve an acceptable optimality
> and even more so if hyperparameters are bad. But here are two possible
> circumvents:
> 1. Use one-pass algorithms like averaged-SGD, not sure if it can ever work as
> applying stochastic convex-opt algorithm to non-convex problem is anarchy.
> But it may be a long shot.
> 2. Run incomplete passes in each online update using ratings randomly sampled
> (but not uniformly sampled) from latest dataModel. I don't know how exactly
> this should be done but new rating should be sampled more frequently. Uniform
> sampling will results in old ratings being used more than new ratings in
> total. If somebody has worked on this batch-to-online conversion before and
> share his insight that would be awesome. This seems to be the most viable
> option, if I get the non-uniform pseudorandom generator that maintains a
> cumulative uniform distribution I want.
> I found a very old ticket (MAHOUT-572) mentioning online SVD recommender but
> it didn't pay off. Hopefully its not a bad idea to submit a new ticket here.
--
This message was sent by Atlassian JIRA
(v6.2#6252)