[
https://issues.apache.org/jira/browse/MATH-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017961#comment-17017961
]
Gilles Sadowski commented on MATH-1509:
---------------------------------------
Thanks for your interest in contributing.
A few comment about the PR:
* {{ClusterUtils}} defines utilities that are seemingly redundant with those
in ["Commons
RNG"|http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.3/org/apache/commons/rng/sampling/ListSampler.html].
* Why are there _protected_ methods?
* All fields and methods (including _private_ ones) must have a Javadoc
comment.
* Comments should be in English. ;)
> Implement the MiniBatchKMeansClusterer
> --------------------------------------
>
> Key: MATH-1509
> URL: https://issues.apache.org/jira/browse/MATH-1509
> Project: Commons Math
> Issue Type: New Feature
> Reporter: Chen Tao
> Priority: Major
> Attachments: compare.png
>
>
> MiniBatchKMeans is a fast clustering algorithm,
> which use partial points in initialize cluster centers, and mini batch in
> training iterations.
> It can finish in few seconds on clustering millions of data, and has few
> differences between KMeans.
> I have implemented it by Kotlin in my own project, and I'd like to contribute
> the code to Apache Commons Math, of course in java.
> My implemention is base on Apache Commons Math3, refer to Python
> sklearn.cluster.MiniBatchKMeans
> Thought test I found it works well on intensive data, significant performance
> improvement and return value has few difference to KMeans++, but has many
> difference on sparse data.
>
> Below is the comparation of my implemention and KMeansPlusPlusClusterer
> !compare.png!
>
> I have created a pull request on
> [https://github.com/apache/commons-math/pull/117], for reference only.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)