[ 
https://issues.apache.org/jira/browse/MAHOUT-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707087#comment-13707087
 ] 

Sebastian Schelter commented on MAHOUT-1274:
--------------------------------------------

I think the main problem here is that we need to change the API to support 
online recommenders. Currently recommenders mostly use DataModels in a 
read-only fashion and batch update themselves via refresh(). I think the 
setPreference() methods should not be on the recommender, but only the 
DataModel.

I think a few API changes are necessary on the recommenders. First, we want 
simpler data stores than DataModel, which only allow to iterate over the 
preferences and don't give efficient random access. This is necessary to allow 
us to process datasets with a few hundred million interactions on a single 
machine. See FactorizablePreferences in the examples for a first shot. Second 
we need support for online training of recommenders.

I think we should discuss this stuff carefully on the mailinglist. We should 
also keep in mind that a lot of people use the recommenders already, so we 
should try to stay as backwards compatible as possible.

                
> SGD-based Online SVD recommender
> --------------------------------
>
>                 Key: MAHOUT-1274
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1274
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Peng Cheng
>            Assignee: Sean Owen
>              Labels: collaborative-filtering, features, machine_learning, svd
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> an online SVD recommender is otherwise similar to an offline SVD recommender 
> except that, upon receiving one or several new recommendations, it can add 
> them into the training dataModel and update the result accordingly in real 
> time.
> an online SVD recommender should override setPreference(...) and 
> removePreference(...) in AbstractRecommender such that the factorization 
> result is updated in O(1) time and without retraining.
> Right now the slopeOneRecommender is the only component possessing such 
> capability.
> Since SGD is intrinsically an online algorithm and its CF implementation is 
> available in core-0.8 (See MAHOUT-1089, MAHOUT-1272), I presume it would be a 
> good time to convert it. Such feature could come in handy for some websites.
> Implementation: Adding new users, items, or increasing rating matrix rank are 
> just increasing size of user and item matrices. Reducing rating matrix rank 
> involves just one svd. The real challenge here is that sgd is NO ONE-PASS 
> algorithm, multiple passes are required to achieve an acceptable optimality 
> and even more so if hyperparameters are bad. But here are two possible 
> circumvents:
> 1. Use one-pass algorithms like averaged-SGD, not sure if it can ever work as 
> applying stochastic convex-opt algorithm to non-convex problem is anarchy. 
> But it may be a long shot.
> 2. Run incomplete passes in each online update using ratings randomly sampled 
> (but not uniformly sampled) from latest dataModel. I don't know how exactly 
> this should be done but new rating should be sampled more frequently. Uniform 
> sampling will results in old ratings being used more than new ratings in 
> total. If somebody has worked on this batch-to-online conversion before and 
> share his insight that would be awesome. This seems to be the most viable 
> option, if I get the non-uniform pseudorandom generator that maintains a 
> cumulative uniform distribution I want.
> I found a very old ticket (MAHOUT-572) mentioning online SVD recommender but 
> it didn't pay off. Hopefully its not a bad idea to submit a new ticket here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to