Peng Cheng created MAHOUT-1274:
----------------------------------
Summary: SGD-based Online SVD recommender
Key: MAHOUT-1274
URL: https://issues.apache.org/jira/browse/MAHOUT-1274
Project: Mahout
Issue Type: New Feature
Components: Collaborative Filtering
Reporter: Peng Cheng
Assignee: Sean Owen
an online SVD recommender is otherwise similar to an offline SVD recommender
except that, upon receiving one or several new recommendations, it can add them
into the training dataModel and update the result accordingly in real time.
an online SVD recommender should override setPreference(...) and
removePreference(...) in AbstractRecommender such that the factorization result
is updated in O(1) time and without retraining.
Right now the slopeOneRecommender is the only component possessing such
capability.
Since SGD is intrinsically an online algorithm and its CF implementation is
available in core-0.8 (See MAHOUT-1089, MAHOUT-1272), I presume it would be a
good time to convert it. Such feature could come in handy for some websites.
Implementation: Adding new users, items, or increasing rating matrix rank are
just increasing size of user and item matrices. Reducing rating matrix rank
involves just one svd. The real challenge here is that sgd is NO ONE-PASS
algorithm, multiple passes are required to achieve an acceptable optimality and
even more so if hyperparameters are bad. But here are two possible circumvents:
1. Use one-pass algorithms like averaged-SGD, not sure if it can ever work as
applying stochastic convex-opt algorithm to non-convex problem is anarchy. But
it may be a long shot.
2. Run incomplete passes in each online update using ratings randomly sampled
(but not uniformly sampled) from latest dataModel. I don't know how exactly
this should be done but new rating should be sampled more frequently. Uniform
sampling will results in old ratings being used more than new ratings in total.
If somebody has worked on this batch-to-online conversion before and share his
insight that would be awesome. This seems to be the most viable option, if I
get the non-uniform pseudorandom generator that maintains a cumulative uniform
distribution I want.
I found a very old ticket (MAHOUT-572) mentioning online SVD recommender but it
didn't pay off. Hopefully its not a bad idea to submit a new tickets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira