Derrick Burns created SPARK-6000:
------------------------------------

             Summary: Batch K-Means clusters should support "mini-batch" updates
                 Key: SPARK-6000
                 URL: https://issues.apache.org/jira/browse/SPARK-6000
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.2.1
            Reporter: Derrick Burns
            Priority: Minor


One of the ways of improving the performance of the K-means clustering 
algorithm is to sample the points on each round of the Lloyd's algorithm and to 
only use those samples to update the cluster centers.  (Note that this is 
similar to the update algorithm of streaming K-means.)  The Spark K-Means 
clusterer should support the mini-batch algorithm for large data sets. 

The K-Means implementation at 
https://github.com/derrickburns/generalized-kmeans-clustering supports the 
mini-batch algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to