[
https://issues.apache.org/jira/browse/MAHOUT-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119335#comment-13119335
]
Ted Dunning commented on MAHOUT-824:
------------------------------------
For any method that doesn't have good regularization, trimming helps avoid
over-training. Slope-one and all of the correlation methods have zero
regularization and are seriously susceptible to coincidence. LLR trimming is
kind of the simplest level of regularization. Methods like latent factor
log-linear have serious and real regularization and probably don't need
trimming.
> FastByIDRunningAverage: Optimize SlopeOneRecommender by optimizing
> MemoryDiffStorage
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-824
> URL: https://issues.apache.org/jira/browse/MAHOUT-824
> Project: Mahout
> Issue Type: Improvement
> Reporter: Lance Norskog
> Assignee: Sean Owen
> Priority: Trivial
> Fix For: 0.6
>
> Attachments: MAHOUT-824.patch, MAHOUT-824.short.patch
>
>
> The SlopeOneRecommender has by far the best RMS of all of the online
> recommenders in Mahout (that I've found). Unfortunately the implementation
> also uses much more memory and is unuseable on my laptop.
> This patch optimizes memory (and speed) by folding
> FastByIDMap<RunningAverage> into one class: FastByIDRunningAverage. This is
> what it sounds like: a Long-addressable array of running averages (and
> optionally standard deviation).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira