[
https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy Lyubimov reopened MAHOUT-593:
-------------------------------------
Assignee: Dmitriy Lyubimov (was: Ted Dunning)
Reopening.
One of users reports problems with Mahout version. As i mentioned before, we
are actually running CDH3 in the company and I don't have access to 0.20.2 to
test it with (other than the local unit test that runs there).
the CDH3 version that we use is available here
https://github.com/dlyubimov/ssvd-lsi/tree/ssvd-preprocessing and it acually
tested on problems of around 100 M non-zero elements as well as Reuters dataset
Mahout example.
So Mahout patch needs to be tested as well more thoroughly. One of current
problems is that I don't have easy access to a real 0.20.2 cluster to test it.
But at least one user reported this patch got stuck for him.
> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure
> compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-593
> URL: https://issues.apache.org/jira/browse/MAHOUT-593
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.4
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Fix For: 0.5
>
> Attachments: MAHOUT-593.patch.gz, MAHOUT-593.patch.gz,
> MAHOUT-593.patch.gz, MAHOUT-593.patch.gz, SSVD-givens-CLI.pdf,
> ssvdclassdiag.png
>
>
> Current Mahout-376 patch requries 'new' hadoop API. Certain elements of that
> API (namely, multiple outputs) are not available in standard hadoop 0.20.2
> release. As such, that may work only with either CDH or 0.21 distributions.
> In order to bring it into sync with current Mahout dependencies, a backport
> of the patch to 'old' API is needed.
> Also, some work is needed to resolve math dependencies. Existing patch relies
> on apache commons-math 2.1 for eigen decomposition of small matrices. This
> dependency is not currently set up in the mahout core. So, certain snippets
> of code are either required to go to mahout-math or use Colt eigen
> decompositon (last time i tried, my results were mixed with that one. It
> seems to produce results inconsistent with those from mahout-math
> eigensolver, at the very least, it doesn't produce singular values in sorted
> order).
> So this patch is mainly moing some Mahout-376 code around.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira