Dev List,
A couple of colleagues and I have gotten several versions of glmnet algo coded 
and running on Spark RDD. glmnet algo (http://www.jstatsoft.org/v33/i01/paper) 
is a very fast algorithm for generating coefficient paths solving penalized 
regression with elastic net penalties. The algorithm runs fast by taking an 
approach that generates solutions for a wide variety of penalty parameter. 
We're able to integrate into Mllib class structure a couple of different ways. 
The algorithm may fit better into the new pipeline structure since it naturally 
returns a multitide of models (corresponding to different vales of penalty 
parameters). That appears to fit better into pipeline than Mllib linear 
regression (for example).

We've got regression running with the speed optimizations that Friedman 
recommends. We'll start working on the logistic regression version next.

We're eager to make the code available as open source and would like to get 
some feedback about how best to do that. Any thoughts?
Mike Bowles.


Reply via email to