[ https://issues.apache.org/jira/browse/SPARK-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shuo Xiang updated SPARK-1543: ------------------------------ Description: This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving Lasso (elastic net, in fact) in mllib. ADMM is capable of solving a class of composite minimization problems in a distributed way. Specifically for Lasso (if only L1-regularization) or elastic-net (both L1- and L2- regularization), in each iteration, it requires solving independent systems of linear equations on each partition and a subsequent soft-threholding operation on the driver machine. Unlike SGD, it is a deterministic algorithm (except for the random partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html). The linear algebra operations mainly rely on the Breeze library, particularly, it applies `breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the linear system. I tried to follow the organization of existing Lasso implementation. However, as ADMM is also a good fit for similar optimization problems, e.g., (sparse) logistic regression, it may be worth reorganizing and putting ADMM into a separate section. was: This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving Lasso (elastic net, in fact) in mllib. ADMM is capable of solving a class of composite minimization problems in a distributed way. Specifically for Lasso (if only L1-regularization) or elastic-net (both L1- and L2- regularization), it requires solving independent systems of linear equations on each partition and a soft-threholding operation on the driver. Unlike SGD, it is a deterministic algorithm (except for the random partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html). The linear algebra operations mainly rely on the Breeze library, particularly, it applies `breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the linear system. I tried to follow the organization of existing Lasso implementation. However, as ADMM is also a good fit for similar optimization problems, e.g., (sparse) logistic regression, it may worth to re-organize and put ADMM into a separate section. PR: https://github.com/apache/spark/pull/458 > Add ADMM for solving Lasso (and elastic net) problem > ---------------------------------------------------- > > Key: SPARK-1543 > URL: https://issues.apache.org/jira/browse/SPARK-1543 > Project: Spark > Issue Type: New Feature > Reporter: Shuo Xiang > Priority: Minor > Labels: features > Original Estimate: 168h > Remaining Estimate: 168h > > This PR introduces the Alternating Direction Method of Multipliers (ADMM) for > solving Lasso (elastic net, in fact) in mllib. > ADMM is capable of solving a class of composite minimization problems in a > distributed way. Specifically for Lasso (if only L1-regularization) or > elastic-net (both L1- and L2- regularization), in each iteration, it requires > solving independent systems of linear equations on each partition and a > subsequent soft-threholding operation on the driver machine. Unlike SGD, it > is a deterministic algorithm (except for the random partition). Details can > be found in the [S. Boyd's > paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html). > The linear algebra operations mainly rely on the Breeze library, > particularly, it applies `breeze.linalg.cholesky` to perform cholesky > decomposition on each partition to solve the linear system. > I tried to follow the organization of existing Lasso implementation. However, > as ADMM is also a good fit for similar optimization problems, e.g., (sparse) > logistic regression, it may be worth reorganizing and putting ADMM into a > separate section. -- This message was sent by Atlassian JIRA (v6.2#6252)