[
https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-14174:
---------------------------------
Summary: Implement the Mini-Batch KMeans (was: Accelerate KMeans via
Mini-Batch EM)
> Implement the Mini-Batch KMeans
> -------------------------------
>
> Key: SPARK-14174
> URL: https://issues.apache.org/jira/browse/SPARK-14174
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: zhengruifeng
> Attachments: MBKM.xlsx
>
>
> The MiniBatchKMeans is a variant of the KMeans algorithm which uses
> mini-batches to reduce the computation time, while still attempting to
> optimise the same objective function. Mini-batches are subsets of the input
> data, randomly sampled in each training iteration. These mini-batches
> drastically reduce the amount of computation required to converge to a local
> solution. In contrast to other algorithms that reduce the convergence time of
> k-means, mini-batch k-means produces results that are generally only slightly
> worse than the standard algorithm.
> Comparison of the K-Means and MiniBatchKMeans on sklearn :
> http://scikit-learn.org/stable/auto_examples/cluster/plot_mini_batch_kmeans.html#example-cluster-plot-mini-batch-kmeans-py
> Since MiniBatch-KMeans with fraction=1.0 is not equal to KMeans, so I make it
> a new estimator
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]