[jira] [Commented] (SPARK-17201) Investigate numerical instability for MLOR without regularization

Seth Hendrickson (JIRA) Tue, 23 Aug 2016 15:25:04 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433754#comment-15433754
 ]


Seth Hendrickson commented on SPARK-17201:
------------------------------------------

Restating some of what was said on github:

_Concern is that for softmax regression without regularization, the Hessian 
becomes singular and Newton methods can run into problems. Excerpt from this 
[link|http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression]: "Thus, the 
minimizer of J(θ) is not unique. (Interestingly, J(θ) is still convex, and thus 
gradient descent will not run into a local optima problems. But the Hessian is 
singular/non-invertible, which causes a straightforward implementation of 
Newton's method to run into numerical problems.)"_

I looked into this. It is true that for softmax regression the Hessian is 
Symmetric positive _semidefinite_, not symmetric positive definite. There is a 
good-enough proof of such [here|http://qwone.com/~jason/writing/convexLR.pdf]. 
Still consider the quote from the resources mentioned above "... which causes a 
*straightforward* implementation of Newton's method to run into numerical 
problems." It's true the lack of positive definiteness can be a problem for 
*naive* Newton methods, but LBFGS is not a straightforward implementation - it 
does not use the Hessian directly, but it uses an approximation to the Hessian. 
In fact, there are an abundance of resources showing that as long as the 
initial Hessian approximation is symmetric positive definite, then the 
subsequent recursive updates are also symmetric positive definite. From one 
resource: 

"H(-1)_(n + 1) is positive definite (psd) when H^(-1)_n is. Assuming our 
initial guess of H0 is psd, it follows by induction each inverse Hessian 
estimate is as well. Since we can choose any H^(-1)_0 we want, including the 
identity matrix, this is easy to ensure."

I appreciate other opinions on this to make sure I am understanding things 
correctly. Seems like LBFGS will work fine even without regularization. cc 
[~dbtsai] [~WeichenXu123]

> Investigate numerical instability for MLOR without regularization
> -----------------------------------------------------------------
>
>                 Key: SPARK-17201
>                 URL: https://issues.apache.org/jira/browse/SPARK-17201
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, MLlib
>            Reporter: Seth Hendrickson
>
> As mentioned 
> [here|http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression], when no 
> regularization is applied in Softmax regression, second order Newton solvers 
> may run into numerical instability problems. We should investigate this in 
> practice and find a solution, possibly by implementing pivoting when no 
> regularization is applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17201) Investigate numerical instability for MLOR without regularization

Reply via email to