[ 
https://issues.apache.org/jira/browse/MATH-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057095#comment-17057095
 ] 

Chen Tao commented on MATH-1524:
--------------------------------

{quote}Or not... See e.g. issue MATH-1330.{quote}
As my test, I find the Commons Math implementation on k-means is as fast as I 
want, on 542 dimension, about 500,000 points, when k=50 spend 1m30s.
For comparsion: 
The sklearn spend 1m50s(The code has optimize for sparse data).
Deeplearn4j spend about 3 days...

{quote}the addPoint method makes it certain that consistency can be 
broken{quote}
I don't think so. The addPoint should not change the center either on traning 
or predict, the center only should be explicit changed by the training logic, 
in actual use(Same in python sklearn).

{quote}I don't understand why the "misuse" would come from the type of the 
return value being the same.{quote}
The getCenter and centroid is very confused on name and return value.
But centroid used only in algorithm implementations, like k-means++ and 
evaluators, a raw double[] is good enough.
The getCenter will be used in predict, face to ML user, we should encourage the 
user do not use the raw double[]

> "chooseInitialCenters" should move out from KMeansPlusPlusClusterer
> -------------------------------------------------------------------
>
>                 Key: MATH-1524
>                 URL: https://issues.apache.org/jira/browse/MATH-1524
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Chen Tao
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are two reason for "chooseInitialCenters" should be move out from 
> "KMeansPlusPlusClusterer":
> # k-means++ clusterer is a special case of k-means clusterer, that k-means++ 
> initialize the cluster centers with k-means++ algorithm. Another case is 
> initialize the cluster centers with random points.
> # The mini batch k-means will reuse "chooseInitialCenters". 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to