Chen Tao created MATH-1509:
------------------------------

             Summary: Implement the MiniBatchKMeansClusterer
                 Key: MATH-1509
                 URL: https://issues.apache.org/jira/browse/MATH-1509
             Project: Commons Math
          Issue Type: New Feature
            Reporter: Chen Tao


MiniBatchKMeans is a fast clustering algorithm, 

which use partial points in initialize cluster centers, and mini batch in 
training iterations.
It can finish in few seconds on clustering millions of data, and has few 
differences between KMeans.

I have implemented it by Kotlin in my own project, and I'd like to contribute 
the code  to Apache Commons Math, of course in java.

My implemention is base on Apache Commons Math3, refer to Python 
sklearn.cluster.MiniBatchKMeans

Thought test I found it works well on intensive data, significant performance 
improvement and return value has few difference to KMeans++, but has many 
difference on sparse data.

 

I have created a pull request on 
[https://github.com/apache/commons-math/pull/117], for reference only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to