Dan/Ted:

I like that you are implementing streaming k-means.

Are there any results comparing it to mini batch k-means ([1] and the
paper cited therein) ?

In the distributed implementation, you independently compute a
O(k)-means clustering on each partition, then combine them into a
final k-means. Are there any guarantees/results about the accuracy of
this? Clearly this sort of design also favours a storm/spark
implementation - have you considered that?

-Andy



[1] 
http://scikit-learn.org/dev/modules/generated/sklearn.cluster.MiniBatchKMeans.html

Reply via email to