On Tue, May 17, 2011 at 5:26 PM, Hector Yee <[email protected]> wrote:
> I have some proposed contributions and I wonder if they will be useful in > Mahout (otherwise I will just commit it in a new open source project in > github). > These generally sound pretty good. > - Sparse autoencoder (think of it as something like LDA - it has an > unsupervised hidden topic model and an output that reconstructs the input > but blurs it a bit due to the hidden layer bottleneck). The variant I am > planning to implement is optimized for sparse (e.g. text) labels. Not sure > if it will fit into the filter framework? > This would definitely fit into the variable encoder framework where the hashed encoders live. Filters is another reasonable home. Clustering is a reasonable home since it has come to mean "unsupervised stuff" for the most part. > - Boosting with l1 regularization and back pruning. (just the binary case - > I haven't had much luck with the multi-class case vs adaboost ECC). > How scalable is this? > - online kernelized learner for ranking and classification (optimization in > the primal rather than the dual) > This would be very interesting. It would fit in very well next to the SGD models as an interesting alternative/elaboration. > > I'm new to Mahout, so let me know if anyone is working on these already or > not. I've implemented them several times in C++. > These all sound plenty new enough. You good with doing them in Java?
