I'm sorry, I wasn't keeping up with the changes and I see the new implementations now.... apologies for confusion -- I didn't notice the new matrix/ data structures infrastructure because they are not related (Matrix and colt's *Matrix2D* stuff).
> - delete > - add tests, restructure to use tested API, rename to new style > - completely re-implement in a maintainable style Fair enough. I don't think colt's API was all that great either. We are coupled with it pretty closely in the Lingo algorithm (the open source one). But a cleaner API may be definitely worth the switch. > My initial preference was for the second fate as much as possible. That > preference has changed a bit to prefer the first option with the third as a > backup. Partly this is because we are scraping down to the less-used, lower > quality parts of Colt. +1 for reimplementing Colt's functionality from scratch, not caring much about backwards compatibility. We don't use much of math from colt, to be honest. Basically matrix decompositions (SVD), some sorting routines from Sorting class (removed now, but this can be replaced) and a lot of multiplying/ basic operations of vectors and matrices. One thing we DO use heavily is 2 dimensional matrix representation in a double[] array because this allows us to plug in BLAS to work directly on Java data, without copying or other manipulations... but then we don't have any newer native BLAS build and it's been a pain to compile and link it with Java. We care mostly about native Lapack's gesdd (SVD) and Blas's gemm (general matrix multiplication); these do provide significant speedups when clustering larger data sets using Lingo. But I can imagine hardware-accelerated implementations will eventually surface inside mahout-math anyway, so we could switch to these instead of doing all the trickery we currently do with Colt. So, to summarize: don't worry about us much, really. For now we will stick to mahout-math release that we know works for us. I will try to switch to the trunk of mahout-math as a proof of concept (without native matrix computations support) and will let you know if I have any problems. This is a much larger refactoring than I initially thought though. Dawid - D.
