[
https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy Lyubimov updated MAHOUT-593:
------------------------------------
Status: Patch Available (was: Open)
Backport of basic algorithm is done.
Notes:
* git patch, requires 'patch -p1'
* I only ran local unit test,which is MR run in a local mode, but not
distributed one since i don't have 0.20.2 cluster anywhere (i work with CDH
releases). So i verified 376 patches in distributed mode but not this one.
* apache commons dependencies is a mess. math module depends on 2.1 but core
module depends on 1.2 so when run, there are all sorts of linkage errors
because of classes being ocasionally picked up from either 1.2 or 2.1 . I
switched both modules to 2.1. Actually 2.1 dependency in math was test scope
only, but i removed that as i assumed math module is intended to have math
dependencies and hadoop module is intended to have hadoop (MR) stuff. But it
turns out core module has math-commons dependency anyway... so please review
commons-math dependency, i really need it working either in core or math and i
had all SSVD tests ever with 2.1 (fyi).
on a side note, the issue with dependencies above is in part caused by
inconsistent use of <dependencyManagement> tag: some dependencies use single
versioning thru parent's <dependencyManagement> and some (most, actually)
declare their own versioning in subprojects. I think Mahout really needs
dependencies housecleaning work done and move all versions under dependency
management in the parent pom.
> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure
> compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-593
> URL: https://issues.apache.org/jira/browse/MAHOUT-593
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.4
> Reporter: Dmitriy Lyubimov
> Fix For: 0.5
>
> Attachments: MAHOUT-593.patch.gz
>
>
> Current Mahout-376 patch requries 'new' hadoop API. Certain elements of that
> API (namely, multiple outputs) are not available in standard hadoop 0.20.2
> release. As such, that may work only with either CDH or 0.21 distributions.
> In order to bring it into sync with current Mahout dependencies, a backport
> of the patch to 'old' API is needed.
> Also, some work is needed to resolve math dependencies. Existing patch relies
> on apache commons-math 2.1 for eigen decomposition of small matrices. This
> dependency is not currently set up in the mahout core. So, certain snippets
> of code are either required to go to mahout-math or use Colt eigen
> decompositon (last time i tried, my results were mixed with that one. It
> seems to produce results inconsistent with those from mahout-math
> eigensolver, at the very least, it doesn't produce singular values in sorted
> order).
> So this patch is mainly moing some Mahout-376 code around.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.