Hello, Some background on myself - I was at Google the last 5 years working on the self-driving car, image search and youtube in machine learning ( http://www.linkedin.com/in/yeehector)
I have some proposed contributions and I wonder if they will be useful in Mahout (otherwise I will just commit it in a new open source project in github). - Sparse autoencoder (think of it as something like LDA - it has an unsupervised hidden topic model and an output that reconstructs the input but blurs it a bit due to the hidden layer bottleneck). The variant I am planning to implement is optimized for sparse (e.g. text) labels. Not sure if it will fit into the filter framework? - Boosting with l1 regularization and back pruning. (just the binary case - I haven't had much luck with the multi-class case vs adaboost ECC). - online kernelized learner for ranking and classification (optimization in the primal rather than the dual) I'm new to Mahout, so let me know if anyone is working on these already or not. I've implemented them several times in C++. -- Yee Yang Li Hector http://hectorgon.blogspot.com/ (tech + travel) http://hectorgon.com (book reviews)
