On Mon, Jan 25, 2010 at 10:55 AM, Sean Owen <sro...@gmail.com> wrote:
> Agree that we should start planning 0.3, as it will take over a month > I bet to actually be ready. > +1 to releasing within a month or so. > How about everyone take a moment to focus on what's marked for 0.3? > for any issue that concerns you: > Some thoughts inline, before we start reclassifying JIRA tickets all over the place and I lose track of them: > Key Summary > MAHOUT-221 Implementation of FP-Bonsai Pruning for fast pattern mining > MAHOUT-227 Parallel SVM > MAHOUT-240 Parallel version of Perceptron > MAHOUT-241 Example for perceptron > I don't know about any of these really. > MAHOUT-185 Add mahout shell script for easy launching of various > algorithms > This is pretty key, I can add in some Properties file based ways of doing this as well, so that not everything is on the CLI. We don't need a perfect patch here, but a good start would be nice to commit. > MAHOUT-153 Implement kmeans++ for initial cluster selection in kmeans > Is there progress on this one? > MAHOUT-232 Implementation of sequential SVM solver based on Pegasos > This patch looks to be progressing - it would be really nice to get it in. > MAHOUT-228 Need sequential logistic regression implementation using > SGD techniques > This is looking great so far and should make it in for this release. > MAHOUT-263 Matrix interface should extend Iterable<Vector> for better > integration with distributed storage > I've got a patch with this already, but I need to integrate the usage of this with the o.a.m.math.decomposer impls properly. Unit tests aren't succeeding with this patch yet. But it should be in for this release. > MAHOUT-237 Map/Reduce Implementation of Document Vectorizer > Basically done, right? Should be in 0.3 > MAHOUT-220 Mahout Bayes Code cleanup > Ditto for this one. > MAHOUT-265 Error with creating MVC from Lucene Index or Arff > One-line fix for me, I'll get to this shortly. > MAHOUT-215 Provide jars with mahout release. > ++1 on this one getting in. "showstopper" I'd say. > MAHOUT-209 Add aggregate() methods for Vector > It would be really nice to stop monkeying around with the basic linear primitive interfaces, because even though we have AbstractXYZ base classes which can implement most of this stuff... we just should. So that's my way of saying I should either code this up, or close it as Won't Fix. Should not be postponed to 0.4 > MAHOUT-231 Upgrade QM reports to use Clover 2.6 > No idea on this one. > MAHOUT-106 PLSI/EM in pig based on hofmann's ACM 04 paper. > Has anyone looked at this in a million years? > MAHOUT-155 ARFF VectorIterable > We already can convert ARFF to our Vector, do we also need an iterable? Should this just be folded into some kind of "Vectorizer", the output being the usual SequenceFile<Integer, VectorWritable> which will be a basic input into HDFS-backed matrices? > MAHOUT-214 Implement Stacked RBM > This needs to go to 0.4, no progress has been made on this, but I don't want to see it disappear from view into the black whole of "someday" just yet. Them's my $0.03 (inflation). -jake