Thanks, now then I've looked on it again, I think I can improve it more,
since I currently at each iteration
of the seed each points sampled with worst case complexity of O(n) (n is
number of points) I think it's possible
to reduce it to O(log(n)), while using O(n) of additional space.

Best regards,
                      Artem Barger.

On Thu, Jun 23, 2016 at 4:37 PM, Eric Barnhill <ericbarnh...@gmail.com>
wrote:

> I use kmeans a bit and I will look at it.
>
> On Thu, Jun 23, 2016 at 2:10 PM, Artem Barger <ar...@bargr.net> wrote:
>
> > Hi all,
> >
> > While I understand there is a project decision threads are going on ML,
> > however I'd like to suggest and provide some improvements of CM kmeans++
> > implementation in the seeding procedure. Currently sum of squared
> distances
> > computed each iteration during initial centers seeding, which is
> redundant
> > since sum can be computed once and updated within the cycle.
> >
> >
> > Subjected JIRA item explains the optimization and I've also provided
> patch
> > with suggested fix. Would be glad to hear any comments or reviews.
> >
> >
> > Best regards,
> >                       Artem Barger.
> >
>

Reply via email to