Thanks, now then I've looked on it again, I think I can improve it more, since I currently at each iteration of the seed each points sampled with worst case complexity of O(n) (n is number of points) I think it's possible to reduce it to O(log(n)), while using O(n) of additional space.
Best regards, Artem Barger. On Thu, Jun 23, 2016 at 4:37 PM, Eric Barnhill <ericbarnh...@gmail.com> wrote: > I use kmeans a bit and I will look at it. > > On Thu, Jun 23, 2016 at 2:10 PM, Artem Barger <ar...@bargr.net> wrote: > > > Hi all, > > > > While I understand there is a project decision threads are going on ML, > > however I'd like to suggest and provide some improvements of CM kmeans++ > > implementation in the seeding procedure. Currently sum of squared > distances > > computed each iteration during initial centers seeding, which is > redundant > > since sum can be computed once and updated within the cycle. > > > > > > Subjected JIRA item explains the optimization and I've also provided > patch > > with suggested fix. Would be glad to hear any comments or reviews. > > > > > > Best regards, > > Artem Barger. > > >