[ 
https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146065#comment-13146065
 ] 

Paritosh Ranjan commented on MAHOUT-843:
----------------------------------------

Ok. I agree that implementing the post processor will be the smallest step 
which will make top down clustering work. Though the user will have to manually 
code some part of it. 

If we see this post processor as the smallest step towards implementing the top 
down clustering, and considering we are following incremental development ( 
which I have guessed from your comments ), can you tell what all would we need 
for a full fledged top down clustering, in incremental order?

I have added the CLI to the post processor. The CLI asks for the output path 
given to the cluster driver and then post processes it. Is it ok?

Would add some Junit Tests and submit the patch.


                
> Top Down Clustering
> -------------------
>
>                 Key: MAHOUT-843
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-843
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>              Labels: clustering, patch
>             Fix For: 0.6
>
>         Attachments: MAHOUT-843-patch, Top-Down-Clustering-patch
>
>
> Top Down Clustering works in multiple steps. The first step is to find 
> comparative bigger clusters. The second step is to cluster the bigger chunks 
> into meaningful clusters. This can performance while clustering big amount of 
> data. And, it also removes the dependency of providing input clusters/numbers 
> to the clustering algorithm.
> The "big" is a relative term, as well as the smaller "meaningful" terms. So, 
> the control of this "bigger" and "smaller/meaningful" clusters will be 
> controlled by the user.
> Which clustering algorithm to be used in the top level and which to use in 
> the bottom level can also be selected by the user. Initially, it can be done 
> for only one/few clustering algorithms, and later, option can be provided to 
> use all the algorithms ( which suits the case ). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to