[ 
https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994289#comment-12994289
 ] 

Sean Owen commented on MAHOUT-593:
----------------------------------

This is looking good and so I want to get this on into the codebase. I have 
some broad suggestions from a first glance.

* You may need to regenerate this patch as I think a few recent changes will 
conflict
* I think it's a decent convention to not have Mappers and Reducers as inner 
classes. They can be top-level classes in the same package. It may be personal 
preference but I think much of our MapReduce code does this. In general I think 
lots of inner classes that don't need to be inner just makes things hard to 
find.
* It's also preferable, I think to try to standardize how the Job is 
implemented. I believe we're trying to use this AbstractJob class as the 
template for Job implementations. Can this be put into that mold? I see at 
least one implementation does this.
* Spacing looks uneven -- should be 2 spaces per indent
* Might look at the coding conventions in surrounding code -- there are a fair 
number of differences. For example we don't use the "m_" convention for 
members. You might run "mvn checkstyle:checkstyle" for a complete list of 
what's at odds with the project style.
* Apache code doesn't usually have @author tags
* Don't throw RuntimeException -- prefer subclasses
* Use RandomUtils.getRandom(), not new Random(), for test repeatability
* We already have an IOUtils class that does (or can be augmented to do) what 
your IOUtil class does. Best to combine them.
* Can you comment on how this replaces, extends, complements the 2+ existing 
SVD implementation we have already? (In the javadoc ideally)
* extend MahoutTestCase, not TestCase

> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure 
> compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-593
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-593
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.4
>            Reporter: Dmitriy Lyubimov
>             Fix For: 0.5
>
>         Attachments: MAHOUT-593.patch.gz
>
>
> Current Mahout-376 patch requries 'new' hadoop API.  Certain elements of that 
> API (namely, multiple outputs) are not available in standard hadoop 0.20.2 
> release. As such, that may work only with either CDH or 0.21 distributions. 
>  In order to bring it into sync with current Mahout dependencies, a backport 
> of the patch to 'old' API is needed. 
> Also, some work is needed to resolve math dependencies. Existing patch relies 
> on apache commons-math 2.1 for eigen decomposition of small matrices. This 
> dependency is not currently set up in the mahout core. So, certain snippets 
> of code are either required to go to mahout-math or use Colt eigen 
> decompositon (last time i tried, my results were mixed with that one. It 
> seems to produce results inconsistent with those from mahout-math 
> eigensolver, at the very least, it doesn't produce singular values in sorted 
> order).
> So this patch is mainly moing some Mahout-376 code around.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to