[
https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000355#comment-13000355
]
Dmitriy Lyubimov edited comment on MAHOUT-593 at 2/28/11 6:26 PM:
------------------------------------------------------------------
* re: applying the patch. As i mentioned above, it is a git patch (patch -p 1
should work last time i tested it, let me know if you still seem to have
problems). But, once voted, i can commit it myself. I work in Git per suggested
apache git workflow (there's a help page somewhere), maintaining separate
branch for mahout-593 (or any other issue i worked on) in github. I can
merge-squash it directly onto git-svn mahout branch and then push it as one
incremental commit. Also, git patches are _always_ branch patches (i.e. you
need to apply it on top of trunk). The git branch for this patch can be viewed
and checked out here: https://github.com/dlyubimov/ssvd-lsi/tree/MAHOUT-593 (if
it helps to get a better idea what it looks like in the Mahout trunk tree once
applied).
* I am not sure what IOUtils provide. It's quite likely I can reuse it, sure,
the functionality is very basic here. (make sure all closeables are closed in a
collection in the proper order). I'll take a look.
* We are not bringing common-math to Mahout. It already uses it, as 1.2 in
mahout-core and as 2.1 in mahout-math. (Surely, you can't have both versions in
your final assembly even if you declare it -- so it's just fixing this
inconsistency). common-math 1.2 is way outdated (in fact, so oudated that it is
using a different artifact id but still uses the same package/class tree
structure). The fact that it uses same package tree but has different artifact
id causes actually both trees to be included in the final classpath causing
linking errors (i actually ran into it cause i use eclipse; unlike maven, it
can't tell test scope from runtime so it included both, because mahout-math is
using apache common-math in test context only).
* Colt eigensolver in Mahout is marked by "@Deprecated do not use". Even so, i
tried to use it as a first option before failing back on to commons-math. its
results are inconsistent when checked against apache commons. Also, it doesn't
seem to sort eigenvectors and eigenvalues in descendent order, which adds work.
I understand Mahout is in the process of adopting Colt solvers. When it is
done, we can easily migrate to using that (all eigensolver dependencies are
strictly isolated in a single class called "EigenSolverWrapper" so whatever
solver is used can be substituted their easily for a very small in-RAM matrix
such as what B*B' is. Apache-commons results in SSVD solution that is then
asserted against Colt SVD results with epsilon 1e-10 in the unit test. Which
makes me think apache-commons results are consistent with Colt SVD and Colt
eigensolver's are not.
was (Author: dlyubimov2):
* re: applying the patch. As i mentioned above, it is a git patch (patch -p
1 should work last time i tested it, let me know if you still seem to have
problems). But, once voted, i can commit it myself. I work in Git per suggested
apache git workflow (there's a help page somewhere), maintaining separate
branch for mahout-593 (or any other issue i worked on) in github. I can
merge-squash it directly onto git-svn mahout branch and then push it as one
incremental commit.
* I am not sure what IOUtils provide. It's quite likely I can reuse it, sure,
the functionality is very basic here. (make sure all closeables are closed in a
collection in the proper order). I'll take a look.
* We are not bringing common-math to Mahout. It already uses it, as 1.2 in
mahout-core and as 2.1 in mahout-math. (Surely, you can't have both versions in
your final assembly even if you declare it -- so it's just fixing this
inconsistency). common-math 1.2 is way outdated (in fact, so oudated that it is
using a different artifact id but still uses the same package/class tree
structure). The fact that it uses same package tree but has different artifact
id causes actually both trees to be included in the final classpath causing
linking errors (i actually ran into it cause i use eclipse; unlike maven, it
can't tell test scope from runtime so it included both, because mahout-math is
using apache common-math in test context only).
* Colt eigensolver in Mahout is marked by "@Deprecated do not use". Even so, i
tried to use it as a first option before failing back on to commons-math. its
results are inconsistent when checked against apache commons. Also, it doesn't
seem to sort eigenvectors and eigenvalues in descendent order, which adds work.
I understand Mahout is in the process of adopting Colt solvers. When it is
done, we can easily migrate to using that (all eigensolver dependencies are
strictly isolated in a single class called "EigenSolverWrapper" so whatever
solver is used can be substituted their easily for a very small in-RAM matrix
such as what B*B' is. Apache-commons results in SSVD solution that is then
asserted against Colt SVD results with epsilon 1e-10 in the unit test. Which
makes me think apache-commons results are consistent with Colt SVD and Colt
eigensolver's are not.
> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure
> compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-593
> URL: https://issues.apache.org/jira/browse/MAHOUT-593
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.4
> Reporter: Dmitriy Lyubimov
> Fix For: 0.5
>
> Attachments: MAHOUT-593.patch.gz, MAHOUT-593.patch.gz,
> MAHOUT-593.patch.gz, SSVD-givens-CLI.pdf
>
>
> Current Mahout-376 patch requries 'new' hadoop API. Certain elements of that
> API (namely, multiple outputs) are not available in standard hadoop 0.20.2
> release. As such, that may work only with either CDH or 0.21 distributions.
> In order to bring it into sync with current Mahout dependencies, a backport
> of the patch to 'old' API is needed.
> Also, some work is needed to resolve math dependencies. Existing patch relies
> on apache commons-math 2.1 for eigen decomposition of small matrices. This
> dependency is not currently set up in the mahout core. So, certain snippets
> of code are either required to go to mahout-math or use Colt eigen
> decompositon (last time i tried, my results were mixed with that one. It
> seems to produce results inconsistent with those from mahout-math
> eigensolver, at the very least, it doesn't produce singular values in sorted
> order).
> So this patch is mainly moing some Mahout-376 code around.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira