Hi Peng, We deprecated a lot of algorithms that we found to be not much used to streamline our codebase for a coming 1.0 release. Am 06.07.2013 10:25 schrieb "Peng Cheng (JIRA)" <[email protected]>:
> > [ > https://issues.apache.org/jira/browse/MAHOUT-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701380#comment-13701380] > > Peng Cheng commented on MAHOUT-1274: > ------------------------------------ > > BTW may I ask (noobishly) that why you have deprecated the > SlopeOneRecommender in the latest core-0.8 snapshot? i must have missed a > lot in previous mahout-development emails before i join so apologies if its > a stupid question. > > > SGD-based Online SVD recommender > > -------------------------------- > > > > Key: MAHOUT-1274 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1274 > > Project: Mahout > > Issue Type: New Feature > > Components: Collaborative Filtering > > Reporter: Peng Cheng > > Assignee: Sean Owen > > Labels: collaborative-filtering, features, > machine_learning, svd > > Original Estimate: 336h > > Remaining Estimate: 336h > > > > an online SVD recommender is otherwise similar to an offline SVD > recommender except that, upon receiving one or several new recommendations, > it can add them into the training dataModel and update the result > accordingly in real time. > > an online SVD recommender should override setPreference(...) and > removePreference(...) in AbstractRecommender such that the factorization > result is updated in O(1) time and without retraining. > > Right now the slopeOneRecommender is the only component possessing such > capability. > > Since SGD is intrinsically an online algorithm and its CF implementation > is available in core-0.8 (See MAHOUT-1089, MAHOUT-1272), I presume it would > be a good time to convert it. Such feature could come in handy for some > websites. > > Implementation: Adding new users, items, or increasing rating matrix > rank are just increasing size of user and item matrices. Reducing rating > matrix rank involves just one svd. The real challenge here is that sgd is > NO ONE-PASS algorithm, multiple passes are required to achieve an > acceptable optimality and even more so if hyperparameters are bad. But here > are two possible circumvents: > > 1. Use one-pass algorithms like averaged-SGD, not sure if it can ever > work as applying stochastic convex-opt algorithm to non-convex problem is > anarchy. But it may be a long shot. > > 2. Run incomplete passes in each online update using ratings randomly > sampled (but not uniformly sampled) from latest dataModel. I don't know how > exactly this should be done but new rating should be sampled more > frequently. Uniform sampling will results in old ratings being used more > than new ratings in total. If somebody has worked on this batch-to-online > conversion before and share his insight that would be awesome. This seems > to be the most viable option, if I get the non-uniform pseudorandom > generator that maintains a cumulative uniform distribution I want. > > I found a very old ticket (MAHOUT-572) mentioning online SVD recommender > but it didn't pay off. Hopefully its not a bad idea to submit a new ticket > here. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira >
