Ted,

>
> One way around all of this is for you to post a test case for in-memory SVD
> based on the test case from commons-math.  Then all of us can together tweak
> mahout-math eigen and svd decompositions to match the desired behavior.
>
> That gets rid of the entire dependency.

* there's already "in-memory" test (actually, hadoop-in-local mode as
there's no non-hadoop method for SSVD here)  that compares output to
that of Colt's SVD and asserts results and orthogonality with epsilon
1e-10. The test name is LocalSSVDSolverTest and if you apply the
patch, it runs during normal build along with all other Mahout tests.
(I usually run it directly from eclipse though).

All  dependencies on an eigensolver are isolated to the class called
EigenSolverWrapper in math module. it just needs to solve small
symmetric matrix (k+p)x(k+p). Whether it does it doing Colt's solver
or commons-math, doesn't matter, as long as the LocalSSVDSolverTest
passes and stability isn't flaky. If you could manage to do it with
Colt, there would be no other commons-math (or even Colt, for that
matter) dependency in the whole  code. Only mahout-math's one.

* I actually started with testing that step  using Colt's solver. But
i never got 100% successful with it. It had trouble computing matrices
as small as 600x600, throwing internal errors now and then. In
addition, its results never matched that of mahout-math's, I couldn't
allow that to happen with my production, so i switched to the
mahout-math solver, which worked admirably. My approach is pragmatic:
don't try to fix something that is not broken or replace it in favor
of something that is not working at the moment. But that was some time
ago, may be something has changed since then. Either way, replacement
of eigensolver implementation and it's verification is very simple and
nobody would care (as long as it works).

* let me note that i understand the desire to keep matrix library apis
consistent. However, in this case it is a matter of an internal
tooling. The public api is always Mahout's DRM for either input or
output (or VectorWritable in case of singular values). So which
eigensolver's implementation is used to solve BB^t=UGammaU^t, has no
effect whatsoever on the overall Mahout architecture. So that's why i
don't view it as an architectural issue. Aside from that concern, i am
not sure what's so bad about commons-math. We use it, it's seems to be
working and fine for what we use it for (which is mostly
decompositions).

* finally let me note that we are talking about pre-existing Mahout
dependency, not a newly introduced one. So getting rid of it in SSVD
will not get rid of it in Mahout per your stated goal, since perhaps
some methods are using it, otherwise it wouldn't be in the
dependencies in the first place. I am just pointing out that this
dependency outdated and inconsistent across the modules, that's it.
But it's not me who is trying to introduce it. I am just giving a
heads up on this.

> >
>
> That would be an excellent idea.
>
> Can you suggest a patch leading in that direction?  It doesn't have to be
> everything, but an example would allow lots of people who don't quite know
> how to do this to also jump in to help.

This is as simple as declaring all dependency versions in the parent
pom under <dependencymanagement> instead of in individual projects'
poms. (modules). I can certainly help; however, the issue is not
technical but rather logistic: take mahout-math, for example. We have
different versions in different modules. It is easy to move version
declaration under <dependencyManagement> tag; in fact, i beleive,
anyone could do it without specific maven knowledge. The problem is to
decide which version to use there -- 2.1 or 1.2, perhaps there's a
reason why 1.2 is being used in core and updating it may cause some
stuff to fail. If there's indeed such a conflict between different
features in the project, the feature authors need to find a way to
reconcile it. i don't see any reason why would anyone keep dependency
so outdated around, unless there's something specific in that old
version that makes some feature happy.

Either way, i can certainly help out on this, time permitting, but it
bears no relevancy to the mahout-593 which was taking priority for our
production needs; and it's more of a administrative issue rather than
a technical issue.

-Dmitriy

Reply via email to