[ https://issues.apache.org/jira/browse/MATH-1509?focusedWorklogId=409613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-409613 ]
ASF GitHub Bot logged work on MATH-1509: ---------------------------------------- Author: ASF GitHub Bot Created on: 25/Mar/20 16:11 Start Date: 25/Mar/20 16:11 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #132: MATH-1509: Add missing documentation to class ImprovementEvaluator URL: https://github.com/apache/commons-math/pull/132 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 409613) Time Spent: 1h 20m (was: 1h 10m) > Implement the MiniBatchKMeansClusterer > -------------------------------------- > > Key: MATH-1509 > URL: https://issues.apache.org/jira/browse/MATH-1509 > Project: Commons Math > Issue Type: New Feature > Reporter: Chen Tao > Priority: Major > Attachments: compare.png, intensive-data-comparsion-badcase.png, > intensive-data-comparsion.png, random-data-comparison.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > MiniBatchKMeans is a fast clustering algorithm, > which use partial points in initialize cluster centers, and mini batch in > training iterations. > It can finish in few seconds on clustering millions of data, and has few > differences between KMeans. > I have implemented it by Kotlin in my own project, and I'd like to contribute > the code to Apache Commons Math, of course in java. > My implemention is base on Apache Commons Math3, refer to Python > sklearn.cluster.MiniBatchKMeans > Thought test I found it works well on intensive data, significant performance > improvement and return value has few difference to KMeans++, but has many > difference on sparse data. > > Below is the comparation of my implemention and KMeansPlusPlusClusterer > !compare.png! > > I have created a pull request on > [https://github.com/apache/commons-math/pull/117], for reference only. -- This message was sent by Atlassian Jira (v8.3.4#803005)