Hi Timothy, Thank you for getting back!
1. I am willing to maintain the code, it is the part of work I should do as a contributor. 2. I have implemented a preliminary in the start-up I am working for. The algorithm only uses lots of matrix computation. I did not code the algorithm using Mahout's interface yet. 3. The accelerate the computation, I use inner-mapper combiner. I test it by MRUnit test and confirm the results with R packages. 4. I am working on penalized linear regression for some time, I think it is easy to find some applications. Hope I answer your questions. Best, -Michael On Sat, Jun 29, 2013 at 10:27 PM, Timothy Mann <[email protected]>wrote: > Hi Michael, > > Your approach sounds useful and (in my opinion) fills an important gap in > existing OSS machine learning libraries. I for one, would be interested in > an efficient, parallel implementation of regularized regression. I'm not a > contributor to Mahout, but the usual questions when someone wants to > contribute an implemented algorithm seem to be: > > 1. Will you be willing and able (or know of someone who is willing and > able) to maintain the code once it is integrated with Mahout? (mahout > developers currently seem to be stretched a bit thin) > 2. What is the state of the code? Is it already integrated with Mahout? > What libraries does it depend on? Does it conform (or can it be fit) nicely > to Mahout interfaces? How much work will it be (approximately)? > 3. How has your implementation been tested? Do you know of a dataset that > can be used for unit testing the framework? Is there a particular use case > that is driving your implementation and development of this algorithm? > > > -King Tim > > > On Jun 30, 2013 1:15 AM, "Michael Kun Yang" <[email protected]> wrote: > >> Hello, >> >> I recently implemented a single pass algorithm for penalized linear >> regression with cross validation in a big data start-up. I'd like to >> contribute this to Mahout. >> >> Penalized linear regression such as Lasso, Elastic-net are widely used in >> machine learning, but there are no very efficient scalable implementations >> on MapReduce. >> >> The published distributed algorithms for solving this problem is either >> iterative (which is not good for MapReduce, see Steven Boyd's paper) or >> approximate (what if we need exact solutions, see Paralleled stochastic >> gradient descent); another disadvantage of these algorithms is they can >> not >> do cross validation in the training phase, the user must provide a penalty >> parameter in advance. >> >> My ideas can train the model with cross validation in a simple pass. They >> are based on some simple observations. I will post them on Arxiv then >> share >> the link in the follow-up email. >> >> Any feedback would be helpful. >> >> Thanks >> -Michael >> >
