[ 
https://issues.apache.org/jira/browse/MAHOUT-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701380#comment-13701380
 ] 

Peng Cheng commented on MAHOUT-1274:
------------------------------------

BTW may I ask (noobishly) that why you have deprecated the SlopeOneRecommender 
in the latest core-0.8 snapshot? i must have missed a lot in previous 
mahout-development emails before i join so apologies if its a stupid question.
                
> SGD-based Online SVD recommender
> --------------------------------
>
>                 Key: MAHOUT-1274
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1274
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Peng Cheng
>            Assignee: Sean Owen
>              Labels: collaborative-filtering, features, machine_learning, svd
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> an online SVD recommender is otherwise similar to an offline SVD recommender 
> except that, upon receiving one or several new recommendations, it can add 
> them into the training dataModel and update the result accordingly in real 
> time.
> an online SVD recommender should override setPreference(...) and 
> removePreference(...) in AbstractRecommender such that the factorization 
> result is updated in O(1) time and without retraining.
> Right now the slopeOneRecommender is the only component possessing such 
> capability.
> Since SGD is intrinsically an online algorithm and its CF implementation is 
> available in core-0.8 (See MAHOUT-1089, MAHOUT-1272), I presume it would be a 
> good time to convert it. Such feature could come in handy for some websites.
> Implementation: Adding new users, items, or increasing rating matrix rank are 
> just increasing size of user and item matrices. Reducing rating matrix rank 
> involves just one svd. The real challenge here is that sgd is NO ONE-PASS 
> algorithm, multiple passes are required to achieve an acceptable optimality 
> and even more so if hyperparameters are bad. But here are two possible 
> circumvents:
> 1. Use one-pass algorithms like averaged-SGD, not sure if it can ever work as 
> applying stochastic convex-opt algorithm to non-convex problem is anarchy. 
> But it may be a long shot.
> 2. Run incomplete passes in each online update using ratings randomly sampled 
> (but not uniformly sampled) from latest dataModel. I don't know how exactly 
> this should be done but new rating should be sampled more frequently. Uniform 
> sampling will results in old ratings being used more than new ratings in 
> total. If somebody has worked on this batch-to-online conversion before and 
> share his insight that would be awesome. This seems to be the most viable 
> option, if I get the non-uniform pseudorandom generator that maintains a 
> cumulative uniform distribution I want.
> I found a very old ticket (MAHOUT-572) mentioning online SVD recommender but 
> it didn't pay off. Hopefully its not a bad idea to submit a new ticket here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to