|| initialization seems promising along with perhaps ball k-means iterations


On Wed, Apr 2, 2014 at 12:03 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> Considering porting implementation [1] and paper for KMeans || for
> Bindings.
>
> This seems like another method to map fairly nicely.
>
> The problem I am contemplating is ||-initialization, and in particular,
> centroid storage. That particular implementation assumes centroids could be
> kept in memory in front.
>
> (1) Question is, is it a dangerous idea. It doesn't seem like it
> particularly is, since unlikely people would want more k>1e+6. Another
> thing, centers seem to be passed in via closure attribute (i.e.
> java-serialized array-backed matrix).However, with Bindings it is quite
> possible to keep centers at the back as a matrix.
>
> (2) obviously, LLoyd iterations are not terribly accurate. || and ++
> versions mostly speed things up. Is there any better-than-LLoyd accuracy
> preference?
>
>
> [1]
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
>
>
>

Reply via email to