[GSOC 2014] Uniform API for Mahout Clustering

chalitha udara Perera Mon, 17 Mar 2014 10:38:11 -0700

Hi All,

Going through the mail tread Mahout 1.0 goals, I found that the main focus
of mahout is now towards the code re-factoring and integration with Spark
rather than implementing new algorithms. Recently I have used mahout for
implementing document clustering module a Content Management System.


To be honest we had some problems with lack of uniformity among different
clustering algorithms. For example simple Kmeans takes input as the
sequence file with document TF-IDF vectors, while Spectral Kmeans takes the
csv file that defines the similarity matrix.

I think if we can provide a uniform clustering API as mentioned in 1.0
goals, it would be very useful for end user developers.

I would like to proceed with this idea as my GSOC 2014 project. Please let
me know if you are interested in this project
-- 
J.M Chalitha Udara Perera

*Department of Computer Science and Engineering,*
*University of Moratuwa,*
*Sri Lanka*

[GSOC 2014] Uniform API for Mahout Clustering

Reply via email to