[jira] [Commented] (FLINK-3245) KMeans Data Generator writes not the same centroids as it was used for the dataset

Fabian Hueske (JIRA) Sun, 17 Jan 2016 08:49:42 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103785#comment-15103785
 ]


Fabian Hueske commented on FLINK-3245:
--------------------------------------

The purpose of the k-means algorithm is to identify the "hidden" centers of a 
data set. These centers are not known and the algorithm usually works by 
providing random centers. (There are variants of the k-means algorithm that 
pre-compute approximate centers, e.g., based on samples.) 

Initializing the algorithm with centers that are already very close to the 
final centers, somewhat ignores the purpose of the algorithm and does not serve 
as a good example, IMO.

I would close this issue as "Not a Problem", unless you disagree.

Best, Fabian

> KMeans Data Generator writes not the same centroids as it was used for the 
> dataset
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-3245
>                 URL: https://issues.apache.org/jira/browse/FLINK-3245
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Kay
>            Priority: Trivial
>
> Hey guys.
> I am using your really nice KMeans dataset generator. I am wondering what 
> actually is the reason you write out not the same centers as the data 
> generator has used for the generated dataset.
> org.apache.flink.examples.java.clustering.util.KMeansDataGenerator
> LINE 126
> Cheers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3245) KMeans Data Generator writes not the same centroids as it was used for the dataset

Reply via email to