[ https://issues.apache.org/jira/browse/SPARK-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339320#comment-14339320 ]
mike bowles commented on SPARK-1673: ------------------------------------ Good discussion. I can see how it might be faster to propagate an approximate path as a way to provide good starting conditions for an accurate iteration. to some extent the accuracy of the glmnet path can be modulated by loosening the convergence criteria for the inner iteration (the iteration done to find the new minimum after the penalty parameter is decremented). The big time sink is making passes through the data. with glmnet regression the inner iterations don't require making passes through the data so they are much less expensive than the steps in the penalty parameter, which may provoke a pass through the data to deal with a new element being added to the active list. It would be interesting to see what happens if the active set of coefficients was constrained to change less frequently than the penalty parameter. I have a hunch that it might take more (inexpensive) inner iterations to converge when the coefficient were allowed to change, but it would save passes through the data. It would be relatively easy for us to implement this in our code. We can try only letting the active set change every other or every third step in the penalty parameter and see how much change it makes in the coefficient curves. Thanks for the idea. > GLMNET implementation in Spark > ------------------------------ > > Key: SPARK-1673 > URL: https://issues.apache.org/jira/browse/SPARK-1673 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Sung Chung > > This is a Spark implementation of GLMNET by Jerome Friedman, Trevor Hastie, > Rob Tibshirani. > http://www.jstatsoft.org/v33/i01/paper > It's a straightforward implementation of the Coordinate-Descent based L1/L2 > regularized linear models, including Linear/Logistic/Multinomial regressions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org