GitHub user sachingoel0101 reopened a pull request:
https://github.com/apache/flink/pull/757
[FLINK-2131][ml]: Initialization schemes for k-means clustering
This adds two most common initialization strategies for the k-means
clustering algorithm, namely, Random initialization and kmeans++ initialization.
Further details are at https://issues.apache.org/jira/browse/FLINK-2131
[Edit]: Work on kmeans|| has been started and just needs to be finalized.
[Edit]: kmeans|| implementation finished.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sachingoel0101/flink
clustering_initializations
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/757.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #757
----
commit dc2de88bf5e3148bb116cad607fc3c61d9dceac6
Author: Sachin Goel <[email protected]>
Date: 2015-06-02T06:44:30Z
Random and kmeans++ initialization methods added
commit 4a39a19c1425259c71ac6d922b4d9a9f2e7d1c6e
Author: Sachin Goel <[email protected]>
Date: 2015-06-02T15:42:58Z
Merge https://github.com/apache/flink into clustering_initializations
commit cdbb3a0801d364935d455798c695f4615ae74e76
Author: Sachin Goel <[email protected]>
Date: 2015-06-02T19:49:24Z
Merge https://github.com/apache/flink into clustering_initializations
commit 7496e21462e4efc0813450971ae6cbc94d2b2c15
Author: Sachin Goel <[email protected]>
Date: 2015-06-02T22:41:20Z
Initialization costs of random and kmeans++ added
commit 8033c87b71686bd3955281db12583592549406cb
Author: Sachin Goel <[email protected]>
Date: 2015-06-05T21:54:10Z
Merge https://github.com/apache/flink into clustering_initializations
commit 29ed1d3fb31aa038d6ed1a5bf16d58f19565cdf8
Author: Sachin Goel <[email protected]>
Date: 2015-06-05T22:52:02Z
Removed cost parameter from Algorithm itself. Leaving it to the user for
now. Also added support for weighted input data sets
commit 5286c3c21d5019f6ba8ab67c2074570087bc1b3a
Author: Sachin Goel <[email protected]>
Date: 2015-06-06T05:04:55Z
An initial draft of kmeans-par method
commit f3bfad4fc0c6576af14f1e981f8e778445856355
Author: Sachin Goel <[email protected]>
Date: 2015-06-08T10:36:32Z
All three initialization schemes implemented and tested
commit 8496b8fd627ade8dbe7b92949d35d3cce704f1cc
Author: Sachin Goel <[email protected]>
Date: 2015-06-08T10:36:58Z
Merge https://github.com/apache/flink into clustering_initializations
commit 3765a3e6a77a8bdbac21d03be1c43263925b1495
Author: Sachin Goel <[email protected]>
Date: 2015-06-30T08:57:41Z
Merge remote-tracking branch 'upstream/master' into
clustering_initializations
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---