[
https://issues.apache.org/jira/browse/FLINK-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103785#comment-15103785
]
Fabian Hueske commented on FLINK-3245:
--------------------------------------
The purpose of the k-means algorithm is to identify the "hidden" centers of a
data set. These centers are not known and the algorithm usually works by
providing random centers. (There are variants of the k-means algorithm that
pre-compute approximate centers, e.g., based on samples.)
Initializing the algorithm with centers that are already very close to the
final centers, somewhat ignores the purpose of the algorithm and does not serve
as a good example, IMO.
I would close this issue as "Not a Problem", unless you disagree.
Best, Fabian
> KMeans Data Generator writes not the same centroids as it was used for the
> dataset
> ----------------------------------------------------------------------------------
>
> Key: FLINK-3245
> URL: https://issues.apache.org/jira/browse/FLINK-3245
> Project: Flink
> Issue Type: Bug
> Reporter: Kay
> Priority: Trivial
>
> Hey guys.
> I am using your really nice KMeans dataset generator. I am wondering what
> actually is the reason you write out not the same centers as the data
> generator has used for the generated dataset.
> org.apache.flink.examples.java.clustering.util.KMeansDataGenerator
> LINE 126
> Cheers
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)