Re: [Math] kmeans++: decouple EM LLoyd's iterations and initial seeding of clustering centers.

Artem Barger Wed, 01 Jun 2016 07:25:10 -0700


On Tue, May 31, 2016 at 4:04 PM, Artem Barger <ar...@bargr.net> wrote:


> Hi,
>
> Current implementation of kmeans within CM framework, inherently uses
> algorithm published by  Arthur, David, and Sergei Vassilvitskii.
> "k-means++: The advantages of careful seeding." *Proceedings of the
> eighteenth annual ACM-SIAM symposium on Discrete algorithms*. Society for
> Industrial and Applied Mathematics, 2007. While there other alternative
> algorithms for initial seeding is available, for instance:
>
> 1. Random initialization (each center picked uniformly at random).
> 2. Canopy https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
> 3. Bicriteria  Feldman, Dan, et al. "Bi-criteria linear-time
> approximations for generalized k-mean/median/center." *Proceedings of the
> twenty-third annual symposium on Computational geometry*. ACM, 2007.
>
> While I understand that kmeans++ is preferable option, others could be
> also used for testing, trials and evaluations as well.
>
> I'd like to propose to separate logic of seeding and clustering to
> increase flexibility for kmeans clustering. Would be glad to hear your
> comments, pros/cons or rejections...
>
>
I've found "Scalable KMeans" or kmeans|| as referred in the
http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf, which provides
parallelizable seeding procedure.
I guess this might serve as additional +1 vote for doing separation
between seeding and LLoyd's iterations in current implementations of kmeans.

Best,
    Artem Barger.

Re: [Math] kmeans++: decouple EM LLoyd's iterations and initial seeding of clustering centers.

Reply via email to