My friends and I are continuing work on the algorithm. You are right that there are two elements to Friedman's glmnet algorithm. One is the use of coordinate descent for minimizing penalized regression with an absolute value penalty and the other is managing the regularization parameters. Friedmans algorithm does return the the entire regularization path. We have had to get fairly deep into the mechanics of linear algebra. The tricky part has been arranging the matrix and vector multiplications to minimize the compute times - (e.g. big time differences between multiplying by a submatrix versus mulbiplying by the columns in the submatrix, etc. )
All of the versions we've produced generate a multitude of solutions (default = 100) for a range of different values of the regularization parameter. The solutions always cover the most heavily penalized end of the curve. The number of solutions generated depends on how fine the steps are and how close the solutions get to the fully saturated (un-penalized) solution. Default values for these work about 80% of the time. Personally, i've always found it useful to have the entire regularization path. One way or another, that's always required to get a final solution. It's just a question of whether the points on the path are generated by hunting and pecking or done all in one shot systematically. mike -----Original Message----- From: Patrick [mailto:petz2...@gmail.com] Sent: Tuesday, August 4, 2015 12:50 AM To: d...@sparapache.org Subject: Re: Have Friedman's glmnet algo running in Spark I have a follow up on this: I see on JIRA that the idea of having a GLMNET imp entation was more orless abandoned, since a OWLQN implementation was chosen to construct a modelusing L1/L2 regularization. However, GLMNET has the property of "returning a multitide of models(corresponding to different vales of penalty parameters [for theregularization])". I think this is not the case in the OWLQN implementation. However, this would be really helpful to compare the accuracy of models withdifferent regParam values. As far as I understood, this would avoid to have a costly cross-validationstep over a possibly large set of regParam values. Joseph Bradley wrote> Some of this discussion seems valuable enough to preserve on the JIRA; can> we move it there (and copy any relevant discussions from previous emails> as> needed)?> > On Wed, Feb 25, 2015 at 10:35 AM, <> mike@> > wrote:--View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Have-Friedman-s-glmnet-algo-running-in-Spark-tp10692p13587.htmlSent from the Apache Spark Developers List mailing list archive at Nabble.com.---------------------------------------------------------------------To unsubscribe, e-mail: dev-unsubscribe@spark.apache.orgFor additional commands, e-mail: dev-h...@spark.apache.org