Hi Timothy,

Thank you for getting back!

1. I am willing to maintain the code, it is the part of work I should do as
a contributor.
2. I have implemented a preliminary in the start-up I am working for. The
algorithm only uses lots of matrix computation. I did not code the
algorithm using Mahout's interface yet.
3. The accelerate the computation, I use inner-mapper combiner. I test it
by MRUnit test and confirm the results with R packages.
4. I am working on penalized linear regression for some time, I think it is
easy to find some applications.

Hope I answer your questions.

Best,
-Michael


On Sat, Jun 29, 2013 at 10:27 PM, Timothy Mann <[email protected]>wrote:

> Hi Michael,
>
> Your approach sounds useful and (in my opinion) fills an important gap in
> existing OSS machine learning libraries. I for one, would be interested in
> an efficient, parallel implementation of regularized regression. I'm not a
> contributor to Mahout, but the usual questions when someone wants to
> contribute an implemented algorithm seem to be:
>
> 1. Will you be willing and able (or know of someone who is willing and
> able) to maintain the code once it is integrated with Mahout? (mahout
> developers currently seem to be stretched a bit thin)
> 2. What is the state of the code? Is it already integrated with Mahout?
> What libraries does it depend on? Does it conform (or can it be fit) nicely
> to Mahout interfaces? How much work will it be (approximately)?
> 3. How has your implementation been tested? Do you know of a dataset that
> can be used for unit testing the framework? Is there a particular use case
> that is driving your implementation and development of this algorithm?
>
>
> -King Tim
>
>
> On Jun 30, 2013 1:15 AM, "Michael Kun Yang" <[email protected]> wrote:
>
>> Hello,
>>
>> I recently implemented a single pass algorithm for penalized linear
>> regression with cross validation in a big data start-up. I'd like to
>> contribute this to Mahout.
>>
>> Penalized linear regression such as Lasso, Elastic-net are widely used in
>> machine learning, but there are no very efficient scalable implementations
>> on MapReduce.
>>
>> The published distributed algorithms for solving this problem is either
>> iterative (which is not good for MapReduce, see Steven Boyd's paper) or
>> approximate (what if we need exact solutions, see Paralleled stochastic
>> gradient descent); another disadvantage of these algorithms is they can
>> not
>> do cross validation in the training phase, the user must provide a penalty
>> parameter in advance.
>>
>> My ideas can train the model with cross validation in a simple pass. They
>> are based on some simple observations. I will post them on Arxiv then
>> share
>> the link in the follow-up email.
>>
>> Any feedback would be helpful.
>>
>> Thanks
>> -Michael
>>
>

Reply via email to