On Mon, Jan 25, 2010 at 10:55 AM, Sean Owen <sro...@gmail.com> wrote:

> Agree that we should start planning 0.3, as it will take over a month
> I bet to actually be ready.
>

+1 to releasing within a month or so.


> How about everyone take a moment to focus on what's marked for 0.3?
> for any issue that concerns you:
>

Some thoughts inline, before we start reclassifying JIRA tickets
all over the place and I lose track of them:


> Key     Summary
> MAHOUT-221      Implementation of FP-Bonsai Pruning for fast pattern mining
> MAHOUT-227      Parallel SVM
> MAHOUT-240      Parallel version of Perceptron
> MAHOUT-241      Example for perceptron
>

I don't know about any of these really.


> MAHOUT-185      Add mahout shell script for easy launching of various
> algorithms
>

This is pretty key, I can add in some Properties file based ways of doing
this as well,
so that not everything is on the CLI.  We don't need a perfect patch here,
but a good
start would be nice to commit.


> MAHOUT-153      Implement kmeans++ for initial cluster selection in kmeans
>

Is there progress on this one?


> MAHOUT-232      Implementation of sequential SVM solver based on Pegasos
>

This patch looks to be progressing - it would be really nice to get it in.


> MAHOUT-228      Need sequential logistic regression implementation using
> SGD techniques
>

This is looking great so far and should make it in for this release.


> MAHOUT-263      Matrix interface should extend Iterable<Vector> for better
> integration with distributed storage
>

I've got a patch with this already, but I need to integrate the usage of
this with
the o.a.m.math.decomposer impls properly.  Unit tests aren't succeeding with
this
patch yet.  But it should be in for this release.


> MAHOUT-237      Map/Reduce Implementation of Document Vectorizer
>

Basically done, right?  Should be in 0.3


> MAHOUT-220      Mahout Bayes Code cleanup
>

Ditto for this one.


> MAHOUT-265      Error with creating MVC from Lucene Index or Arff
>

One-line fix for me, I'll get to this shortly.


> MAHOUT-215      Provide jars with mahout release.
>

++1 on this one getting in.  "showstopper" I'd say.


> MAHOUT-209      Add aggregate() methods for Vector
>

It would be really nice to stop monkeying around with the basic linear
primitive interfaces, because even though we have AbstractXYZ base classes
which can implement most of this stuff... we just should.  So that's my way
of
saying I should either code this up, or close it as Won't Fix.  Should not
be
postponed to 0.4


> MAHOUT-231      Upgrade QM reports to use Clover 2.6
>

No idea on this one.


> MAHOUT-106      PLSI/EM in pig based on hofmann's ACM 04 paper.
>

Has anyone looked at this in a million years?


> MAHOUT-155      ARFF VectorIterable
>

We already can convert ARFF to our Vector, do we also need an iterable?
Should this just be folded into some kind of "Vectorizer", the output being
the usual SequenceFile<Integer, VectorWritable> which will be a basic input
into HDFS-backed matrices?


> MAHOUT-214      Implement Stacked RBM
>

This needs to go to 0.4, no progress has been made on this, but I don't
want to see it disappear from view into the black whole of "someday" just
yet.

Them's my $0.03 (inflation).

  -jake

Reply via email to