[ https://issues.apache.org/jira/browse/SPARK-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512214#comment-16512214 ]
zhengruifeng commented on SPARK-14174: -------------------------------------- [~mlnick] [~mengxr] [~josephkb] Mini-Batch KMeans is much faster than KMeans, do you have any plan to involve it in MLLIb? Thanks > Implement the Mini-Batch KMeans > ------------------------------- > > Key: SPARK-14174 > URL: https://issues.apache.org/jira/browse/SPARK-14174 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: zhengruifeng > Priority: Major > Attachments: MBKM.xlsx > > > The MiniBatchKMeans is a variant of the KMeans algorithm which uses > mini-batches to reduce the computation time, while still attempting to > optimise the same objective function. Mini-batches are subsets of the input > data, randomly sampled in each training iteration. These mini-batches > drastically reduce the amount of computation required to converge to a local > solution. In contrast to other algorithms that reduce the convergence time of > k-means, mini-batch k-means produces results that are generally only slightly > worse than the standard algorithm. > Comparison of the K-Means and MiniBatchKMeans on sklearn : > http://scikit-learn.org/stable/auto_examples/cluster/plot_mini_batch_kmeans.html#example-cluster-plot-mini-batch-kmeans-py > Since MiniBatch-KMeans with fraction=1.0 is not equal to KMeans, so I make it > a new estimator -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org