[ 
https://issues.apache.org/jira/browse/SPARK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213945#comment-14213945
 ] 

RJ Nowling commented on SPARK-2429:
-----------------------------------

Hi Yu,

I'm having trouble finding the function to cut a dendrogram -- I see the tests 
but not the implementation.

I feel that you should be able to assign values in O(log N) time with the 
hierarchical method vs O(N) with the standard kmeans.  So, say you train a 
model (this may be slower than kmeans) then assign additional points to 
clusters after training.  If clusters at the same levels in the hierarchy do 
not overlap, you should be able to choose the closest cluster at each level 
until you find a leaf.  I'm assuming that the children of a given cluster are 
contained within that cluster (spacially) -- can you show this or find a 
reference for this?  If so, then assignment should be faster for a larger 
number of clusters as Jun was saying above.

Do you agree with this?  Or is there something I am misunderstanding!

Thanks!

> Hierarchical Implementation of KMeans
> -------------------------------------
>
>                 Key: SPARK-2429
>                 URL: https://issues.apache.org/jira/browse/SPARK-2429
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: RJ Nowling
>            Assignee: Yu Ishikawa
>            Priority: Minor
>         Attachments: 2014-10-20_divisive-hierarchical-clustering.pdf, The 
> Result of Benchmarking a Hierarchical Clustering.pdf, 
> benchmark-result.2014-10-29.html, benchmark2.html
>
>
> Hierarchical clustering algorithms are widely used and would make a nice 
> addition to MLlib.  Clustering algorithms are useful for determining 
> relationships between clusters as well as offering faster assignment. 
> Discussion on the dev list suggested the following possible approaches:
> * Top down, recursive application of KMeans
> * Reuse DecisionTree implementation with different objective function
> * Hierarchical SVD
> It was also suggested that support for distance metrics other than Euclidean 
> such as negative dot or cosine are necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to