Derrick Burns created SPARK-6000:
------------------------------------
Summary: Batch K-Means clusters should support "mini-batch" updates
Key: SPARK-6000
URL: https://issues.apache.org/jira/browse/SPARK-6000
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.2.1
Reporter: Derrick Burns
Priority: Minor
One of the ways of improving the performance of the K-means clustering
algorithm is to sample the points on each round of the Lloyd's algorithm and to
only use those samples to update the cluster centers. (Note that this is
similar to the update algorithm of streaming K-means.) The Spark K-Means
clusterer should support the mini-batch algorithm for large data sets.
The K-Means implementation at
https://github.com/derrickburns/generalized-kmeans-clustering supports the
mini-batch algorithm.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]