Hi Michael,

Your approach sounds useful and (in my opinion) fills an important gap in
existing OSS machine learning libraries. I for one, would be interested in
an efficient, parallel implementation of regularized regression. I'm not a
contributor to Mahout, but the usual questions when someone wants to
contribute an implemented algorithm seem to be:

1. Will you be willing and able (or know of someone who is willing and
able) to maintain the code once it is integrated with Mahout? (mahout
developers currently seem to be stretched a bit thin)
2. What is the state of the code? Is it already integrated with Mahout?
What libraries does it depend on? Does it conform (or can it be fit) nicely
to Mahout interfaces? How much work will it be (approximately)?
3. How has your implementation been tested? Do you know of a dataset that
can be used for unit testing the framework? Is there a particular use case
that is driving your implementation and development of this algorithm?


-King Tim


On Jun 30, 2013 1:15 AM, "Michael Kun Yang" <[email protected]> wrote:

> Hello,
>
> I recently implemented a single pass algorithm for penalized linear
> regression with cross validation in a big data start-up. I'd like to
> contribute this to Mahout.
>
> Penalized linear regression such as Lasso, Elastic-net are widely used in
> machine learning, but there are no very efficient scalable implementations
> on MapReduce.
>
> The published distributed algorithms for solving this problem is either
> iterative (which is not good for MapReduce, see Steven Boyd's paper) or
> approximate (what if we need exact solutions, see Paralleled stochastic
> gradient descent); another disadvantage of these algorithms is they can not
> do cross validation in the training phase, the user must provide a penalty
> parameter in advance.
>
> My ideas can train the model with cross validation in a simple pass. They
> are based on some simple observations. I will post them on Arxiv then share
> the link in the follow-up email.
>
> Any feedback would be helpful.
>
> Thanks
> -Michael
>

Reply via email to