Hi Sebastian,

Thanks a lot for help! You mean core-1.0 or bundle-1.0? I hope I can work hard enough to catch the next release. Also, what do you think about the proposed online pseudorandom sampling problem?

I was digging old threads and found MAHOUT-1069, which already did a lot of work I need right now, and used a lot of code optimization techniques, but was eventually rejected for being too complex and drastic. :-<

I wonder if overengineering is a researcher's most dangerous bane, happened to a lot of people.

On 13-07-06 01:31 PM, Sebastian Schelter wrote:
Hi Peng,

We deprecated a lot of algorithms that we found to be not much used to
streamline our codebase for a coming 1.0 release.
Am 06.07.2013 10:25 schrieb "Peng Cheng (JIRA)" <[email protected]>:

     [
https://issues.apache.org/jira/browse/MAHOUT-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701380#comment-13701380]

Peng Cheng commented on MAHOUT-1274:
------------------------------------

BTW may I ask (noobishly) that why you have deprecated the
SlopeOneRecommender in the latest core-0.8 snapshot? i must have missed a
lot in previous mahout-development emails before i join so apologies if its
a stupid question.

SGD-based Online SVD recommender
--------------------------------

                 Key: MAHOUT-1274
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1274
             Project: Mahout
          Issue Type: New Feature
          Components: Collaborative Filtering
            Reporter: Peng Cheng
            Assignee: Sean Owen
              Labels: collaborative-filtering, features,
machine_learning, svd
   Original Estimate: 336h
  Remaining Estimate: 336h

an online SVD recommender is otherwise similar to an offline SVD
recommender except that, upon receiving one or several new recommendations,
it can add them into the training dataModel and update the result
accordingly in real time.
an online SVD recommender should override setPreference(...) and
removePreference(...) in AbstractRecommender such that the factorization
result is updated in O(1) time and without retraining.
Right now the slopeOneRecommender is the only component possessing such
capability.
Since SGD is intrinsically an online algorithm and its CF implementation
is available in core-0.8 (See MAHOUT-1089, MAHOUT-1272), I presume it would
be a good time to convert it. Such feature could come in handy for some
websites.
Implementation: Adding new users, items, or increasing rating matrix
rank are just increasing size of user and item matrices. Reducing rating
matrix rank involves just one svd. The real challenge here is that sgd is
NO ONE-PASS algorithm, multiple passes are required to achieve an
acceptable optimality and even more so if hyperparameters are bad. But here
are two possible circumvents:
1. Use one-pass algorithms like averaged-SGD, not sure if it can ever
work as applying stochastic convex-opt algorithm to non-convex problem is
anarchy. But it may be a long shot.
2. Run incomplete passes in each online update using ratings randomly
sampled (but not uniformly sampled) from latest dataModel. I don't know how
exactly this should be done but new rating should be sampled more
frequently. Uniform sampling will results in old ratings being used more
than new ratings in total. If somebody has worked on this batch-to-online
conversion before and share his insight that would be awesome. This seems
to be the most viable option, if I get the non-uniform pseudorandom
generator that maintains a cumulative uniform distribution I want.
I found a very old ticket (MAHOUT-572) mentioning online SVD recommender
but it didn't pay off. Hopefully its not a bad idea to submit a new ticket
here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA
administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira



Reply via email to