[jira] [Commented] (MAHOUT-843) Top Down Clustering

Jeff Eastman (Commented) (JIRA) Thu, 03 Nov 2011 11:43:56 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143430#comment-13143430
 ]


Jeff Eastman commented on MAHOUT-843:
-------------------------------------

Well, passing off the hardest part of the problem to "someone more familiar" is 
guaranteed to make this patch sit in limbo. Right now the Java class approach 
is an interesting experiment. To complete the feature submission, you really 
need to address the CLI too. Or at least find that somebody to help you get it 
done. A half-done feature won't get committed; we are actually moving to remove 
such features from trunk to get ready for a 1.0 release next year.

You still have not said why you believe the Java approach is better than using 
the existing CLIs in a script with your postprocessor. 

FWIW, I think the postprocessor, by itself, with a CLI, JavaDocs, an example 
script and Unit tests is you best path to a submission which will pass muster.
                
> Top Down Clustering
> -------------------
>
>                 Key: MAHOUT-843
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-843
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>              Labels: clustering, patch
>             Fix For: 0.6
>
>         Attachments: MAHOUT-843-patch, Top-Down-Clustering-patch
>
>
> Top Down Clustering works in multiple steps. The first step is to find 
> comparative bigger clusters. The second step is to cluster the bigger chunks 
> into meaningful clusters. This can performance while clustering big amount of 
> data. And, it also removes the dependency of providing input clusters/numbers 
> to the clustering algorithm.
> The "big" is a relative term, as well as the smaller "meaningful" terms. So, 
> the control of this "bigger" and "smaller/meaningful" clusters will be 
> controlled by the user.
> Which clustering algorithm to be used in the top level and which to use in 
> the bottom level can also be selected by the user. Initially, it can be done 
> for only one/few clustering algorithms, and later, option can be provided to 
> use all the algorithms ( which suits the case ). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-843) Top Down Clustering

Reply via email to