Dev List, A couple of colleagues and I have gotten several versions of glmnet algo coded and running on Spark RDD. glmnet algo (http://www.jstatsoft.org/v33/i01/paper) is a very fast algorithm for generating coefficient paths solving penalized regression with elastic net penalties. The algorithm runs fast by taking an approach that generates solutions for a wide variety of penalty parameter. We're able to integrate into Mllib class structure a couple of different ways. The algorithm may fit better into the new pipeline structure since it naturally returns a multitide of models (corresponding to different vales of penalty parameters). That appears to fit better into pipeline than Mllib linear regression (for example).
We've got regression running with the speed optimizations that Friedman recommends. We'll start working on the logistic regression version next. We're eager to make the code available as open source and would like to get some feedback about how best to do that. Any thoughts? Mike Bowles.