When calling for a BFGS or L-BFGS optimization in Optim.jl, the initial inverse hessian is set to the identity matrix. I am running into some trouble with this design decision, as my objective function is numerically unstable at the first function evaluation point x0 - (I \ g0), where x0 is what I know to be a good starting value and g0 is the gradient at that point. If I were to compute a finite-difference hessian H, this first point would be x0 - (H \ g0) which works fine (as far as I can tell).
The numerical stability issues come from 2 sources: (1) the fact that g0 contains entries generally quite a bit larger in absolute value than x0 and (2) my objective function calls for a cholesky factorization of a matrix which is partially defined by x, and this seems to fail for very large absolute values of x. When I peeked at the source for Optim.jl, I noticed that all the underlying newton-style solver routines allow for an explicitly defined initial inverse hessian, but this interface does not seem to be exposed in optimize(). Is it possible to change this? I can see how the code would change, but I'm not github proficient (yet), so I don't now how to make these changes and offer them as a pull request... By the way, MATLAB's fminunc and fmincon don't seem to suffer from this problem since MATLAB's line search operation is able to recover from a Cholesky error and just look for a teensy-tiny step size that works. Thanks in advance for any help or suggestions. -Thom
