[jira] [Commented] (SPARK-1673) GLMNET implementation in Spark

mike bowles (JIRA) Thu, 26 Feb 2015 14:35:10 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339320#comment-14339320
 ]


mike bowles commented on SPARK-1673:
------------------------------------

Good discussion.  I can see how it might be faster to propagate an approximate 
path as a way to provide good starting conditions for an accurate iteration.  
to some extent the accuracy of the glmnet path can be modulated by loosening 
the convergence criteria for the inner iteration (the iteration done to find 
the new minimum after the penalty parameter is decremented).  

The big time sink is making passes through the data.  with glmnet regression 
the inner iterations don't require making passes through the data so they are 
much less expensive than the steps in the penalty parameter, which may provoke 
a pass through the data to deal with a new element being added to the active 
list.  

It would be interesting to see what happens if the active set of coefficients 
was constrained to change less frequently than the penalty parameter.  I have a 
hunch that it might take more (inexpensive) inner iterations to converge when 
the coefficient were allowed to change, but it would save passes through the 
data.  

It would be relatively easy for us to implement this in our code.  We can try 
only letting the active set change every other or every third step in the 
penalty parameter and see how much change it makes in the coefficient curves.  

Thanks for the idea.  

> GLMNET implementation in Spark
> ------------------------------
>
>                 Key: SPARK-1673
>                 URL: https://issues.apache.org/jira/browse/SPARK-1673
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Sung Chung
>
> This is a Spark implementation of GLMNET by Jerome Friedman, Trevor Hastie, 
> Rob Tibshirani.
> http://www.jstatsoft.org/v33/i01/paper
> It's a straightforward implementation of the Coordinate-Descent based L1/L2 
> regularized linear models, including Linear/Logistic/Multinomial regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1673) GLMNET implementation in Spark

Reply via email to