Re: Have Friedman's glmnet algo running in Spark

mike Tue, 24 Feb 2015 14:01:25 -0800

 Joseph,
Thanks for your reply. We'll take the steps you suggest - generate some timing 
comparisons and post them in the GLMNET JIRA with a link from the OWLQN JIRA.


We've got the regression version of GLMNET programmed. The regression version 
only requires a pass through the data each time the active set of coefficients 
changes. That's usualy less than or equal to the number of decrements in the 
penalty coefficient (typical default = 100). The intermediate iterations can be 
done using results of previous passes through the full data set. We're 
expecting the number of data passes will be independent of either number of 
rows or columns in the data set. We're eager to demonstrate this scaling. Do 
you have any suggestions regarding data sets for large scale regression 
problems? It would be nice to demonstrate scaling for both number of rows and 
number of columns.

Thanks for your help.
Mike

-----Original Message-----
From: Joseph Bradley [mailto:jos...@databricks.com]
Sent: Sunday, February 22, 2015 06:48 PM
To: m...@mbowles.com
Cc: dev@spark.apache.org
Subject: Re: Have Friedman's glmnet algo running in Spark

Hi Mike,glmnet has definitely been very successful, and it would be great to 
seehow we can improve optimization in MLlib! There is some related workongoing; 
here are the JIRAs:GLMNET implementation in SparkLinearRegression with L1/L2 
(elastic net) using OWLQN in new ML packageThe GLMNET JIRA has actually been 
closed in favor of the latter JIRA.However, if you're getting good results in 
your experiments, could youplease post them on the GLMNET JIRA and link them 
from the other JIRA? Ifit's faster and more scalable, that would be great to 
find out.As far as where the code should go and the APIs, that can be discussed 
onthe JIRA.I hope this helps, and I'll keep an eye out for updates on the 
JIRAs!JosephOn Thu, Feb 19, 2015 at 10:59 AM,  wrote:> Dev List,> A couple of 
colleagues and I have gotten several versions of glmnet algo> coded and running 
on Spark RDD. glmnet algo (> http://www.jstatsoft.org/v33/i01/paper) is a very 
fast algorithm for> generating coefficient paths solving penalized regression 
with elastic net> penalties. The algorithm runs fast by taking an approach that 
generates> solutions for a wide variety of penalty parameter. We're able to 
integrate> into Mllib class structure a couple of different ways. The algorithm 
may> fit better into the new pipeline structure since it naturally returns a> 
multitide of models (corresponding to different vales of penalty> parameters). 
That appears to fit better into pipeline than Mllib linear> regression (for 
example).>> We've got regression running with the speed optimizations that 
Friedman> recommends. We'll start working on the logistic regression version 
next.>> We're eager to make the code available as open source and would like 
to> get some feedback about how best to do that. Any thoughts?> Mike Bowles.>>>

Re: Have Friedman's glmnet algo running in Spark

Reply via email to