[ 
https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Tao updated MATH-1509:
---------------------------
    Description: 
MiniBatchKMeans is a fast clustering algorithm, 

which use partial points in initialize cluster centers, and mini batch in 
training iterations.
 It can finish in few seconds on clustering millions of data, and has few 
differences between KMeans.

I have implemented it by Kotlin in my own project, and I'd like to contribute 
the code  to Apache Commons Math, of course in java.

My implemention is base on Apache Commons Math3, refer to Python 
sklearn.cluster.MiniBatchKMeans

Thought test I found it works well on intensive data, significant performance 
improvement and return value has few difference to KMeans++, but has many 
difference on sparse data.

 

Below if my implemention compare with KMeansPlusPlusClusterer

  !compare.png!

 

I have created a pull request on 
[https://github.com/apache/commons-math/pull/117], for reference only.

  was:
MiniBatchKMeans is a fast clustering algorithm, 

which use partial points in initialize cluster centers, and mini batch in 
training iterations.
 It can finish in few seconds on clustering millions of data, and has few 
differences between KMeans.

I have implemented it by Kotlin in my own project, and I'd like to contribute 
the code  to Apache Commons Math, of course in java.

My implemention is base on Apache Commons Math3, refer to Python 
sklearn.cluster.MiniBatchKMeans

Thought test I found it works well on intensive data, significant performance 
improvement and return value has few difference to KMeans++, but has many 
difference on sparse data.

 

Below if my implemention compare with KMeansPlusPlusClusterer

 

 

I have created a pull request on 
[https://github.com/apache/commons-math/pull/117], for reference only.


> Implement the MiniBatchKMeansClusterer
> --------------------------------------
>
>                 Key: MATH-1509
>                 URL: https://issues.apache.org/jira/browse/MATH-1509
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Chen Tao
>            Priority: Major
>         Attachments: compare.png
>
>
> MiniBatchKMeans is a fast clustering algorithm, 
> which use partial points in initialize cluster centers, and mini batch in 
> training iterations.
>  It can finish in few seconds on clustering millions of data, and has few 
> differences between KMeans.
> I have implemented it by Kotlin in my own project, and I'd like to contribute 
> the code  to Apache Commons Math, of course in java.
> My implemention is base on Apache Commons Math3, refer to Python 
> sklearn.cluster.MiniBatchKMeans
> Thought test I found it works well on intensive data, significant performance 
> improvement and return value has few difference to KMeans++, but has many 
> difference on sparse data.
>  
> Below if my implemention compare with KMeansPlusPlusClusterer
>   !compare.png!
>  
> I have created a pull request on 
> [https://github.com/apache/commons-math/pull/117], for reference only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to