Kun Yang created MAHOUT-1273:
--------------------------------

             Summary: Single Pass Algorithm for Penalized Linear Regression on 
MapReduce
                 Key: MAHOUT-1273
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1273
             Project: Mahout
          Issue Type: New Feature
            Reporter: Kun Yang


Penalized linear regression such as Lasso, Elastic-net are widely used in 
machine learning, but there are no very efficient scalable implementations on 
MapReduce.

The published distributed algorithms for solving this problem is either 
iterative (which is not good for MapReduce, see Steven Boyd's paper) or 
approximate (what if we need exact solutions, see Paralleled stochastic 
gradient descent); another disadvantage of these algorithms is that they can 
not do cross validation in the training phase, which requires a user-specified 
penalty parameter in advance. 

My ideas can train the model with cross validation in a single pass. They are 
based on some simple observations.

I have implemented the primitive version of this algorithm in Alpine Data Labs. 
Advanced features such as inner-mapper combiner are employed to reduce the 
network traffic in the shuffle phase.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to