On Tue, May 17, 2011 at 5:26 PM, Hector Yee <[email protected]> wrote:

>  I have some proposed contributions and I wonder if they will be useful in
> Mahout (otherwise I will just commit it in a new open source project in
> github).
>

These generally sound pretty good.


> - Sparse autoencoder (think of it as something like LDA - it has an
> unsupervised hidden topic model and an output that reconstructs the input
> but blurs it a bit due to the hidden layer bottleneck). The variant I am
> planning to implement is optimized for sparse (e.g. text) labels. Not sure
> if it will fit into the filter framework?
>

This would definitely fit into the variable encoder framework where the
hashed encoders live.

Filters is another reasonable home.

Clustering is a reasonable home since it has come to mean "unsupervised
stuff" for the most part.


> - Boosting with l1 regularization and back pruning. (just the binary case -
> I haven't had much luck with the multi-class case vs adaboost ECC).
>

How scalable is this?


> - online kernelized learner for ranking and classification (optimization in
> the primal rather than the dual)
>

This would be very interesting.  It would fit in very well next to the SGD
models as an interesting alternative/elaboration.


>
> I'm new to Mahout, so let me know if anyone is working on these already or
> not. I've implemented them several times in C++.
>

These all sound plenty new enough.

You good with doing them in Java?

Reply via email to