Hi All, Going through the mail tread Mahout 1.0 goals, I found that the main focus of mahout is now towards the code re-factoring and integration with Spark rather than implementing new algorithms. Recently I have used mahout for implementing document clustering module a Content Management System.
To be honest we had some problems with lack of uniformity among different clustering algorithms. For example simple Kmeans takes input as the sequence file with document TF-IDF vectors, while Spectral Kmeans takes the csv file that defines the similarity matrix. I think if we can provide a uniform clustering API as mentioned in 1.0 goals, it would be very useful for end user developers. I would like to proceed with this idea as my GSOC 2014 project. Please let me know if you are interested in this project -- J.M Chalitha Udara Perera *Department of Computer Science and Engineering,* *University of Moratuwa,* *Sri Lanka*
