[
https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994289#comment-12994289
]
Sean Owen commented on MAHOUT-593:
----------------------------------
This is looking good and so I want to get this on into the codebase. I have
some broad suggestions from a first glance.
* You may need to regenerate this patch as I think a few recent changes will
conflict
* I think it's a decent convention to not have Mappers and Reducers as inner
classes. They can be top-level classes in the same package. It may be personal
preference but I think much of our MapReduce code does this. In general I think
lots of inner classes that don't need to be inner just makes things hard to
find.
* It's also preferable, I think to try to standardize how the Job is
implemented. I believe we're trying to use this AbstractJob class as the
template for Job implementations. Can this be put into that mold? I see at
least one implementation does this.
* Spacing looks uneven -- should be 2 spaces per indent
* Might look at the coding conventions in surrounding code -- there are a fair
number of differences. For example we don't use the "m_" convention for
members. You might run "mvn checkstyle:checkstyle" for a complete list of
what's at odds with the project style.
* Apache code doesn't usually have @author tags
* Don't throw RuntimeException -- prefer subclasses
* Use RandomUtils.getRandom(), not new Random(), for test repeatability
* We already have an IOUtils class that does (or can be augmented to do) what
your IOUtil class does. Best to combine them.
* Can you comment on how this replaces, extends, complements the 2+ existing
SVD implementation we have already? (In the javadoc ideally)
* extend MahoutTestCase, not TestCase
> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure
> compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-593
> URL: https://issues.apache.org/jira/browse/MAHOUT-593
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.4
> Reporter: Dmitriy Lyubimov
> Fix For: 0.5
>
> Attachments: MAHOUT-593.patch.gz
>
>
> Current Mahout-376 patch requries 'new' hadoop API. Certain elements of that
> API (namely, multiple outputs) are not available in standard hadoop 0.20.2
> release. As such, that may work only with either CDH or 0.21 distributions.
> In order to bring it into sync with current Mahout dependencies, a backport
> of the patch to 'old' API is needed.
> Also, some work is needed to resolve math dependencies. Existing patch relies
> on apache commons-math 2.1 for eigen decomposition of small matrices. This
> dependency is not currently set up in the mahout core. So, certain snippets
> of code are either required to go to mahout-math or use Colt eigen
> decompositon (last time i tried, my results were mixed with that one. It
> seems to produce results inconsistent with those from mahout-math
> eigensolver, at the very least, it doesn't produce singular values in sorted
> order).
> So this patch is mainly moing some Mahout-376 code around.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira