[ 
https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000355#comment-13000355
 ] 

Dmitriy Lyubimov commented on MAHOUT-593:
-----------------------------------------

* re: applying the patch. As i mentioned above, it is a git patch (patch -p 1 
should work last time i tested it, let me know if you still seem to have 
problems). But, once voted, i can commit it myself. I work in Git per suggested 
apache git workflow (there's a help page somewhere), maintaining separate 
branch for mahout. I can merge-squash it directly onto git-svn mahout branch 
and then push it as one incremental commit.
* I am not sure what IOUtils provide. It's quite likely I can reuse it, sure, 
the functionality is very basic here. (make sure all closeables are closed in a 
collection in the proper order). I'll take a look. 
* We are not bringing common-math to Mahout. It already uses it, as 1.2 in 
mahout-core and as 2.1 in mahout-math. (Surely, you can't have both versions in 
your final assembly even if you declare it -- so it's just fixing this 
inconsistency). common-math 1.2 is way outdated (in fact, so oudated that it is 
using a different artifact id but still uses the same classpath). The fact that 
it uses same package tree but has different artifact id causes actually both 
trees to be included in the final classpath causing linking errors (i actually 
ran into it cause i use eclipse; unlike maven, it can't tell test scope from 
runtime so it included both, because mahout-math is using apache common-math in 
test context only).
* Colt eigensolver in Mahout is marked by "@Deprecated do not use". Even so, i 
tried to use it as a first option before failing back on to commons-math. its 
results are inconsistent when checked against apache commons. Also, it doesn't 
seem to sort eigenvectors and eigenvalues in descendent order, which adds work. 
I understand Mahout is in the process of adopting Colt solvers. When it is 
done, we can easily migrate to using that (all eigensolver dependencies are 
strictly isolated in a single class called "EigenSolverWrapper" so whatever 
solver is used can be substituted their easily for a very small in-RAM matrix 
such as what B*B' is.   


> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure 
> compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-593
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-593
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.4
>            Reporter: Dmitriy Lyubimov
>             Fix For: 0.5
>
>         Attachments: MAHOUT-593.patch.gz, MAHOUT-593.patch.gz, 
> MAHOUT-593.patch.gz, SSVD-givens-CLI.pdf
>
>
> Current Mahout-376 patch requries 'new' hadoop API.  Certain elements of that 
> API (namely, multiple outputs) are not available in standard hadoop 0.20.2 
> release. As such, that may work only with either CDH or 0.21 distributions. 
>  In order to bring it into sync with current Mahout dependencies, a backport 
> of the patch to 'old' API is needed. 
> Also, some work is needed to resolve math dependencies. Existing patch relies 
> on apache commons-math 2.1 for eigen decomposition of small matrices. This 
> dependency is not currently set up in the mahout core. So, certain snippets 
> of code are either required to go to mahout-math or use Colt eigen 
> decompositon (last time i tried, my results were mixed with that one. It 
> seems to produce results inconsistent with those from mahout-math 
> eigensolver, at the very least, it doesn't produce singular values in sorted 
> order).
> So this patch is mainly moing some Mahout-376 code around.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to