[
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568392#comment-14568392
]
Sachin Goel commented on FLINK-1731:
------------------------------------
I'm creating a separate issue for Initialization schemes. This would address
the Random, kmeans++ and kmeans|| initialization methods. Since any
initialization itself is a solution to the kmeans problem, they would all be
instances of Predictor also. User can access the centroids learned via
instance.centroids and pass them to the KMeans algorithm which has been
implemented.
These is another way possible which takes the burden off the user to figure out
how to pass the initial centroids to KMeans. We can have a parameter which
signifies which initialization scheme to use. The KMeans algorithm would then
need to call the appropriate initialization scheme in its fit function and work
with the centroids found by the initialization scheme as its initial centroids.
> Add kMeans clustering algorithm to machine learning library
> -----------------------------------------------------------
>
> Key: FLINK-1731
> URL: https://issues.apache.org/jira/browse/FLINK-1731
> Project: Flink
> Issue Type: New Feature
> Components: Machine Learning Library
> Reporter: Till Rohrmann
> Assignee: Peter Schrott
> Labels: ML
>
> The Flink repository already contains a kMeans implementation but it is not
> yet ported to the machine learning library. I assume that only the used data
> types have to be adapted and then it can be more or less directly moved to
> flink-ml.
> The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better
> implementation because the improve the initial seeding phase to achieve near
> optimal clustering. It might be worthwhile to implement kMeans||.
> Resources:
> [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
> [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)