Thank you for getting back. I will post the idea on arxiv.
On Sat, Jun 29, 2013 at 10:58 PM, Ted Dunning <[email protected]> wrote: > If practical, this could be very handy. > > For reference, penalized linear regression can be used to solve > compressive sensing problems. It can also be used to accurately reverse > engineer hashed vector representations. > > > > > On Sat, Jun 29, 2013 at 10:27 PM, Timothy Mann <[email protected]>wrote: > >> Hi Michael, >> >> Your approach sounds useful and (in my opinion) fills an important gap in >> existing OSS machine learning libraries. I for one, would be interested in >> an efficient, parallel implementation of regularized regression. I'm not a >> contributor to Mahout, but the usual questions when someone wants to >> contribute an implemented algorithm seem to be: >> >> 1. Will you be willing and able (or know of someone who is willing and >> able) to maintain the code once it is integrated with Mahout? (mahout >> developers currently seem to be stretched a bit thin) >> 2. What is the state of the code? Is it already integrated with Mahout? >> What libraries does it depend on? Does it conform (or can it be fit) >> nicely >> to Mahout interfaces? How much work will it be (approximately)? >> 3. How has your implementation been tested? Do you know of a dataset that >> can be used for unit testing the framework? Is there a particular use case >> that is driving your implementation and development of this algorithm? >> >> >> -King Tim >> >> >> On Jun 30, 2013 1:15 AM, "Michael Kun Yang" <[email protected]> wrote: >> >> > Hello, >> > >> > I recently implemented a single pass algorithm for penalized linear >> > regression with cross validation in a big data start-up. I'd like to >> > contribute this to Mahout. >> > >> > Penalized linear regression such as Lasso, Elastic-net are widely used >> in >> > machine learning, but there are no very efficient scalable >> implementations >> > on MapReduce. >> > >> > The published distributed algorithms for solving this problem is either >> > iterative (which is not good for MapReduce, see Steven Boyd's paper) or >> > approximate (what if we need exact solutions, see Paralleled stochastic >> > gradient descent); another disadvantage of these algorithms is they can >> not >> > do cross validation in the training phase, the user must provide a >> penalty >> > parameter in advance. >> > >> > My ideas can train the model with cross validation in a simple pass. >> They >> > are based on some simple observations. I will post them on Arxiv then >> share >> > the link in the follow-up email. >> > >> > Any feedback would be helpful. >> > >> > Thanks >> > -Michael >> > >> > >
