Ted, > > One way around all of this is for you to post a test case for in-memory SVD > based on the test case from commons-math. Then all of us can together tweak > mahout-math eigen and svd decompositions to match the desired behavior. > > That gets rid of the entire dependency.
* there's already "in-memory" test (actually, hadoop-in-local mode as there's no non-hadoop method for SSVD here) that compares output to that of Colt's SVD and asserts results and orthogonality with epsilon 1e-10. The test name is LocalSSVDSolverTest and if you apply the patch, it runs during normal build along with all other Mahout tests. (I usually run it directly from eclipse though). All dependencies on an eigensolver are isolated to the class called EigenSolverWrapper in math module. it just needs to solve small symmetric matrix (k+p)x(k+p). Whether it does it doing Colt's solver or commons-math, doesn't matter, as long as the LocalSSVDSolverTest passes and stability isn't flaky. If you could manage to do it with Colt, there would be no other commons-math (or even Colt, for that matter) dependency in the whole code. Only mahout-math's one. * I actually started with testing that step using Colt's solver. But i never got 100% successful with it. It had trouble computing matrices as small as 600x600, throwing internal errors now and then. In addition, its results never matched that of mahout-math's, I couldn't allow that to happen with my production, so i switched to the mahout-math solver, which worked admirably. My approach is pragmatic: don't try to fix something that is not broken or replace it in favor of something that is not working at the moment. But that was some time ago, may be something has changed since then. Either way, replacement of eigensolver implementation and it's verification is very simple and nobody would care (as long as it works). * let me note that i understand the desire to keep matrix library apis consistent. However, in this case it is a matter of an internal tooling. The public api is always Mahout's DRM for either input or output (or VectorWritable in case of singular values). So which eigensolver's implementation is used to solve BB^t=UGammaU^t, has no effect whatsoever on the overall Mahout architecture. So that's why i don't view it as an architectural issue. Aside from that concern, i am not sure what's so bad about commons-math. We use it, it's seems to be working and fine for what we use it for (which is mostly decompositions). * finally let me note that we are talking about pre-existing Mahout dependency, not a newly introduced one. So getting rid of it in SSVD will not get rid of it in Mahout per your stated goal, since perhaps some methods are using it, otherwise it wouldn't be in the dependencies in the first place. I am just pointing out that this dependency outdated and inconsistent across the modules, that's it. But it's not me who is trying to introduce it. I am just giving a heads up on this. > > > > That would be an excellent idea. > > Can you suggest a patch leading in that direction? It doesn't have to be > everything, but an example would allow lots of people who don't quite know > how to do this to also jump in to help. This is as simple as declaring all dependency versions in the parent pom under <dependencymanagement> instead of in individual projects' poms. (modules). I can certainly help; however, the issue is not technical but rather logistic: take mahout-math, for example. We have different versions in different modules. It is easy to move version declaration under <dependencyManagement> tag; in fact, i beleive, anyone could do it without specific maven knowledge. The problem is to decide which version to use there -- 2.1 or 1.2, perhaps there's a reason why 1.2 is being used in core and updating it may cause some stuff to fail. If there's indeed such a conflict between different features in the project, the feature authors need to find a way to reconcile it. i don't see any reason why would anyone keep dependency so outdated around, unless there's something specific in that old version that makes some feature happy. Either way, i can certainly help out on this, time permitting, but it bears no relevancy to the mahout-593 which was taking priority for our production needs; and it's more of a administrative issue rather than a technical issue. -Dmitriy
