Derrick Burns created SPARK-3424: ------------------------------------ Summary: KMeans Plus Plus is too slow Key: SPARK-3424 URL: https://issues.apache.org/jira/browse/SPARK-3424 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.0.2 Reporter: Derrick Burns
The KMeansPlusPlus algorithm is implemented in time O( m k^2), where m is the rounds of the KMeansParallel algorithm and k is the number of clusters. This can be dramatically improved by maintaining the distance the closest cluster center from round to round and then incrementally updating that value for each point. This incremental update is O(1) time, this reduces the running time for K Means Plus Plus to O( m k ). For large k, this is significant. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org