Re: [jira] Commented: (MAHOUT-479) Streamline classification/ clustering data structures

Ted Dunning Tue, 17 Aug 2010 09:32:02 -0700

Jeff,

You asked about clustering things to do.


In my mind, there are two clustering issues.  One is unification at the
command level where clusters are learned.  The other is unification in
subsequent steps where somebody might want to use a clustering.  The second
issue actually seems a bit more pressing to me.

That second issue concerns the ability to have a model that is the output of
the clustering.  That model should support:

- reading the model from persistent storage

- classifying new vectors to get either a single best-fit cluster or a score
vector.


In my view, this should apply equally to all classifiers and the models
produced by classifier learning algorithms should be the same at the
interface level as the models produced by cluster learning algorithms.


On Tue, Aug 17, 2010 at 9:26 AM, Ted Dunning (JIRA) <j...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899452#action_12899452]
>
> Ted Dunning commented on MAHOUT-479:
> ------------------------------------
>
> I just moved the encoding objects associated with MAHOUT-228 to
> org.apache.mahout.vectors to provide a nucleus for feature encoding.
>
> There are also a fair number of things in oam.text and oam.utils that are
> related.  Since those are in the utils module, however, I couldn't leverage
> them.  We may want to consider moving some of them to core to allow wider
> use.
>
> > Streamline classification/ clustering data structures
> > -----------------------------------------------------
> >
> >                 Key: MAHOUT-479
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-479
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: Classification, Clustering
> >    Affects Versions: 0.1, 0.2, 0.3, 0.4
> >            Reporter: Isabel Drost
> >
> > Opening this JIRA issue to collect ideas on how to streamline our
> classification and clustering algorithms to make integration for users
> easier as per mailing list thread
> http://markmail.org/message/pnzvrqpv5226twfs
> > {quote}
> > Jake and Robin and I were talking the other evening and a common lament
> was that our classification (and clustering) stuff was all over the map in
> terms of data structures.  Driving that to rest and getting those comments
> even vaguely as plug and play as our much more advanced recommendation
> components would be very, very helpful.
> > {quote}
> > This issue probably also realates to MAHOUT-287 (intention there is to
> make naive bayes run on vectors as input).
> > Ted, Jake, Robin: Would be great if someone of you could add a comment on
> some of the issues you discussed "the other evening" and (if applicable) any
> minor or major changes you think could help solve this issue.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (MAHOUT-479) Streamline classification/ clustering data structures

Reply via email to