Hello, My name is Prashanth Duvvada and I'm currently a 2nd year Computer Science student of Amrita University, India.
I'm interested in working on the GSoC 2k18 project "Algorithm Optimization" and would like to work on improving the K Means algorithm. I went through the code base and found out that a normal K Means method is using the Lloyd's Algorithm. Although widely used it has two major disadvantages 1) Worst case complexity is super polynomial of input size(mainly exponential) 2) Approximation can be bad in some cases, specially during formation of clustering. To overcome the above two problems, we have K Means ++, which is a second popular choice and implemented in several areas to improve the standard K Means method. 1) With K-Means++, it's sure to find the solution in O(log k). 2) To address the approximations, a procedure is followed to initialize right centers before proceeding for the standard iterations. For reference, I have used link [1] and [2] for understanding this algorithm. I would like to know what mentors think of this methods and if it can be used to improve the current methods. [1]-https://en.wikipedia.org/wiki/K-means%2B%2B [2]- https://people.eecs.berkeley.edu/~brecht/cs294docs/week2/07.arthur.kmeans.pdf -- Thanks, Prashanth Amrita University <http://amrita.edu>
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
