Re: [MATH] MATH-1378: KMeansPlusPlusClusterer optimize seeding procedure.

Artem Barger Thu, 23 Jun 2016 13:32:30 -0700

Thanks, now then I've looked on it again, I think I can improve it more,
since I currently at each iteration
of the seed each points sampled with worst case complexity of O(n) (n is
number of points) I think it's possible
to reduce it to O(log(n)), while using O(n) of additional space.


Best regards,
                      Artem Barger.

On Thu, Jun 23, 2016 at 4:37 PM, Eric Barnhill <ericbarnh...@gmail.com>
wrote:

> I use kmeans a bit and I will look at it.
>
> On Thu, Jun 23, 2016 at 2:10 PM, Artem Barger <ar...@bargr.net> wrote:
>
> > Hi all,
> >
> > While I understand there is a project decision threads are going on ML,
> > however I'd like to suggest and provide some improvements of CM kmeans++
> > implementation in the seeding procedure. Currently sum of squared
> distances
> > computed each iteration during initial centers seeding, which is
> redundant
> > since sum can be computed once and updated within the cycle.
> >
> >
> > Subjected JIRA item explains the optimization and I've also provided
> patch
> > with suggested fix. Would be glad to hear any comments or reviews.
> >
> >
> > Best regards,
> >                       Artem Barger.
> >
>

Re: [MATH] MATH-1378: KMeansPlusPlusClusterer optimize seeding procedure.

Reply via email to