KMeans|| opinions

Dmitriy Lyubimov Wed, 02 Apr 2014 00:04:23 -0700

Considering porting implementation [1] and paper for KMeans || for
Bindings.


This seems like another method to map fairly nicely.

The problem I am contemplating is ||-initialization, and in particular,
centroid storage. That particular implementation assumes centroids could be
kept in memory in front.

(1) Question is, is it a dangerous idea. It doesn't seem like it
particularly is, since unlikely people would want more k>1e+6. Another
thing, centers seem to be passed in via closure attribute (i.e.
java-serialized array-backed matrix).However, with Bindings it is quite
possible to keep centers at the back as a matrix.

(2) obviously, LLoyd iterations are not terribly accurate. || and ++
versions mostly speed things up. Is there any better-than-LLoyd accuracy
preference?


[1]
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala

KMeans|| opinions

Reply via email to