Chen Tao created MATH-1509:
------------------------------
Summary: Implement the MiniBatchKMeansClusterer
Key: MATH-1509
URL: https://issues.apache.org/jira/browse/MATH-1509
Project: Commons Math
Issue Type: New Feature
Reporter: Chen Tao
MiniBatchKMeans is a fast clustering algorithm,
which use partial points in initialize cluster centers, and mini batch in
training iterations.
It can finish in few seconds on clustering millions of data, and has few
differences between KMeans.
I have implemented it by Kotlin in my own project, and I'd like to contribute
the code to Apache Commons Math, of course in java.
My implemention is base on Apache Commons Math3, refer to Python
sklearn.cluster.MiniBatchKMeans
Thought test I found it works well on intensive data, significant performance
improvement and return value has few difference to KMeans++, but has many
difference on sparse data.
I have created a pull request on
[https://github.com/apache/commons-math/pull/117], for reference only.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)