[ 
https://issues.apache.org/jira/browse/SPARK-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014046#comment-16014046
 ] 

Nick Pentreath commented on SPARK-6000:
---------------------------------------

Even though SPARK-14174 is later - it seems there is more discussion and a 
related PR there. I will close this as duplicate?

> Batch K-Means clusters should support "mini-batch" updates
> ----------------------------------------------------------
>
>                 Key: SPARK-6000
>                 URL: https://issues.apache.org/jira/browse/SPARK-6000
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.2.1
>            Reporter: Derrick Burns
>            Priority: Minor
>
> One of the ways of improving the performance of the K-means clustering 
> algorithm is to sample the points on each round of the Lloyd's algorithm and 
> to only use those samples to update the cluster centers.  (Note that this is 
> similar to the update algorithm of streaming K-means.)  The Spark K-Means 
> clusterer should support the mini-batch algorithm for large data sets. 
> The K-Means implementation at 
> https://github.com/derrickburns/generalized-kmeans-clustering supports the 
> mini-batch algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to