[
https://issues.apache.org/jira/browse/MAHOUT-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kun Yang updated MAHOUT-1273:
-----------------------------
Description:
Penalized linear regression such as Lasso, Elastic-net are widely used in
machine learning, but there are no very efficient scalable implementations on
MapReduce.
The published distributed algorithms for solving this problem is either
iterative (which is not good for MapReduce, see Steven Boyd's paper) or
approximate (what if we need exact solutions, see Paralleled stochastic
gradient descent); another disadvantage of these algorithms is that they can
not do cross validation in the training phase, which requires a user-specified
penalty parameter in advance.
My ideas can train the model with cross validation in a single pass. They are
based on some simple observations.
I have implemented the primitive version of this algorithm in Alpine Data Labs.
was:
Penalized linear regression such as Lasso, Elastic-net are widely used in
machine learning, but there are no very efficient scalable implementations on
MapReduce.
The published distributed algorithms for solving this problem is either
iterative (which is not good for MapReduce, see Steven Boyd's paper) or
approximate (what if we need exact solutions, see Paralleled stochastic
gradient descent); another disadvantage of these algorithms is that they can
not do cross validation in the training phase, which requires a user-specified
penalty parameter in advance.
My ideas can train the model with cross validation in a single pass. They are
based on some simple observations.
I have implemented the primitive version of this algorithm in Alpine Data Labs.
Advanced features such as inner-mapper combiner are employed to reduce the
network traffic in the shuffle phase.
> Single Pass Algorithm for Penalized Linear Regression with Cross Validation
> on MapReduce
> ----------------------------------------------------------------------------------------
>
> Key: MAHOUT-1273
> URL: https://issues.apache.org/jira/browse/MAHOUT-1273
> Project: Mahout
> Issue Type: New Feature
> Reporter: Kun Yang
> Attachments: PenalizedLinear.pdf
>
> Original Estimate: 720h
> Remaining Estimate: 720h
>
> Penalized linear regression such as Lasso, Elastic-net are widely used in
> machine learning, but there are no very efficient scalable implementations on
> MapReduce.
> The published distributed algorithms for solving this problem is either
> iterative (which is not good for MapReduce, see Steven Boyd's paper) or
> approximate (what if we need exact solutions, see Paralleled stochastic
> gradient descent); another disadvantage of these algorithms is that they can
> not do cross validation in the training phase, which requires a
> user-specified penalty parameter in advance.
> My ideas can train the model with cross validation in a single pass. They are
> based on some simple observations.
> I have implemented the primitive version of this algorithm in Alpine Data
> Labs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira