Derrick Burns created SPARK-3424:
------------------------------------

             Summary: KMeans Plus Plus is too slow
                 Key: SPARK-3424
                 URL: https://issues.apache.org/jira/browse/SPARK-3424
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.0.2
            Reporter: Derrick Burns


The  KMeansPlusPlus algorithm is implemented in time O( m k^2), where m is the 
rounds of the KMeansParallel algorithm and k is the number of clusters.  

This can be dramatically improved by maintaining the distance the closest 
cluster center from round to round and then incrementally updating that value 
for each point. This incremental update is O(1) time, this reduces the running 
time for K Means Plus Plus to O( m k ).  For large k, this is significant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to