[ 
https://issues.apache.org/jira/browse/MAHOUT-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman resolved MAHOUT-887.
---------------------------------

    Resolution: Invalid
      Assignee: Jeff Eastman

In general, top-down clustering begins with all points assigned to a single 
cluster and then iteratively uses some algorithm to split them. Bottom-up 
clustering starts with one cluster for each point and then uses some algorithm 
to iteratively merge them. Both of these approaches have scalability challenges 
due to all the bookkeeping required and really break down if a probabilistic 
cluster assignment (e.g. fuzzyk/dirichlet) is needed.

You can search the mail archive and JIRAs for MSC to find these discussions. 
The scalability issues involve the requirement to use a single reducer (for the 
last iteration at least) and cluster growth due to retaining the ids of all the 
clusters that have merged with it.

MAHOUT-843 is aimed at supporting heterogeneous, top-down, hierarchical 
clustering where the choice of algorithm at every level is up to the user and 
where each algorithm may itself be iterative. That's a bit different than the 
homogeneous, top-down clustering I described above. As clustering algorithms 
cannot be used to merge clusters, there is no way to use them to build 
heterogeneous, bottom-up clusterers which would be the opposite of 843.

I agree this issue can be closed.
                
> Bottom Up Clustering
> --------------------
>
>                 Key: MAHOUT-887
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-887
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: Linux Windows
>            Reporter: Paritosh Ranjan
>            Assignee: Jeff Eastman
>              Labels: features
>             Fix For: 0.6
>
>
> Bottom up clustering is achieved by starting with small clusters/single 
> points and then merging clusters recursively which are closer than a 
> specified control constraint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to