Re: [julia-users] Optim.jl line search problems

John Myles White Wed, 20 Aug 2014 07:33:42 -0700

It seems odd that your objective function takes less time than the optimization 
routine itself, unless you include the calls to your objective function in the 
cost of the optimization routine. The optimization routine does very little 
work: most of the line search’s cost is induced by repeatedly calling calling 
the objective function at trial points to decide a stepsize.


The search path for L-BFGS is likely to be somewhat different from BFGS because 
it uses a different approximation to the Hessian. Whether that difference is 
substantive is problem-dependent.

— John

On Aug 20, 2014, at 7:23 AM, Thomas Covert <[email protected]> wrote:

> My (current) objective function has about 30 parameters, so N^2 complexity 
> isn't a problem (storage-wise or matrix multiplication time wise).  Also, for 
> my current work, the objective function is much slower than the optimization 
> routine itself, so the overhead of a full inverse Hessian is relatively small.
> 
> In Optim.jl, L-BFGS seems to use the same line search routine as BFGS.  Is 
> there a reason to think it should take substantively different search path?
> 
> 
> -thom
> 
> On Wednesday, August 20, 2014 9:18:29 AM UTC-5, John Myles White wrote:
> I don’t think I’m going to have time to look into this soon, but why do you 
> use BFGS? In my experience L-BFGS is almost always better.
> 
> Of course, we want our BFGS code to be better. But I use BFGS only quite 
> rarely because of its O(N^2) complexity.
> 
>  — John
> 
> On Aug 20, 2014, at 7:16 AM, Thomas Covert <[email protected]> wrote:
> 
>> Ok after reading the paper which the hz_linesearch! routine is based on, I 
>> can see that I'm wrong about this.  Still puzzled, but definitely wrong!
>> 
>> On Tuesday, August 19, 2014 1:51:37 PM UTC-5, Thomas Covert wrote:
>> I'm seeing this same error (ERROR: assertion failed: lsr.slope[ib] < 0) 
>> again, and this time my gradients (evaluated at "reasonable" input values) 
>> match the finite difference output generated by Calculus.jl's "gradient" 
>> function.  The function I am trying to minize is globally convex (its a 
>> multinomial logit log-likelihood).
>> 
>> I encounter this assertion error after a few successful iterations of BFGS 
>> and it is caused by NAN's in the gradient of the test point.  BFGS gets to 
>> this
>> test point because the step size it passes to hz_linesearch eventually gets 
>> to be large, and a big enough step can cause floating point errors in the 
>> calculation of the the derivatives.  For example, on a recent minimization 
>> attempt, the assertion error happens when "c" (the step size passed by bfgs 
>> to hz_linesearch) appears to be about 380.
>> 
>> I think this is happening because hz_linesearch (a) expands the step size by 
>> a factor of 5 (see line 280 in hz_linesearch) until it encounters upward 
>> movement and (b) passes this new value (or a moving average of it) back to 
>> the caller (i.e., bfgs).  So, the next time bfgs calls hz_linesearch, it 
>> starts out with a potentially large value for the first step.
>> 
>> I don't really know much about line search routines, but is this way things 
>> ought to be?  I would have thought that for each new call to a line search 
>> routine, the step size should reset to a default value.
>> 
>> By the way, is it possible to enable display of the internal values of "c" 
>> in the line search routines?  It looks like there is some debugging code in 
>> there but I'm not sure how to turn it on.
>> 
>> -thom
>> 
>> 
>> On Wednesday, July 30, 2014 6:24:26 PM UTC-5, John Myles White wrote:
>> I’ve never seen our line search methods produce an error that wasn’t caused 
>> by errors in the gradient. The line search methods generally only work with 
>> function values and gradients, so they’re either buggy (which they haven’t 
>> proven to be) or they’re brittle to errors in function definitions/gradient 
>> definitions.
>> 
>> Producing better error message would be great. I once started to do that, 
>> but realized that I needed to come back to fully understanding the line 
>> search code before I could insert useful errors. Would love to see 
>> improvements there.
>> 
>>  — John
>> 
>> On Jul 30, 2014, at 3:17 PM, Thomas Covert <[email protected]> wrote:
>> 
>>> I've done some more sleuthing and have concluded that the problem was on my 
>>> end (a bug in the gradient calculation, as you predicted). 
>>> 
>>> Is an inaccurate gradient the only way someone should encounter this 
>>> assertion error?  I don't know enough about line search methods to have an 
>>> intuition about that, but if it is the case, maybe the line search routine 
>>> should throw a more informative error?
>>> 
>>> -Thom
>>> 
>>> On Wednesday, July 30, 2014 3:44:51 PM UTC-5, John Myles White wrote:
>>> Would be useful to understand exactly what goes wrong if we want to fix 
>>> this problem. I’m mostly used to errors caused by inaccurate gradients, so 
>>> I don’t have an intuition for the cause of this problem.
>>>  
>>> — John
>>> 
>>> On Jul 30, 2014, at 10:45 AM, Thomas Covert <[email protected]> wrote:
>>> 
>>>> No, I haven't tried that yet - might someday, but I like the idea of 
>>>> running julia native code all the way...  
>>>> 
>>>> However, I did find that manually switching the line search routine to 
>>>> "backtracking_linesearch!" did the trick, so at least we know the problem 
>>>> isn't in Optim.jl's implementation of BFGS itself!
>>>> 
>>>> -thom
>>>> 
>>>> On Wednesday, July 30, 2014 12:43:16 PM UTC-5, jbeginner wrote:
>>>> This is not really a solution for this problem but have you tried the 
>>>> NLopt library? From my experience it produces much more stable results and 
>>>> because of problems like the one you describe I have switched to it. I 
>>>> think there is an L-BFGS option also. Although I did not get AD to work 
>>>> with it. The description for all algorithms can be seen here:
>>>> 
>>>> http://ab-initio.mit.edu/wiki/index.php/NLopt_Algorithms
>>>> 
>>>> 
>>>> 
>>>> On Wednesday, July 30, 2014 12:27:36 PM UTC-4, Thomas Covert wrote:
>>>> Recently I've encountered line search errors when using Optim.jl with 
>>>> BFGS.  Here is an example error message
>>>> 
>>>> ERROR: assertion failed: lsr.slope[ib] < 0
>>>>  in bisect! at 
>>>> /pathtojulia/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:577
>>>>  in hz_linesearch! at 
>>>> /pathtojulia/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:273
>>>>  in hz_linesearch! at 
>>>> /pathtojulia/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:201
>>>>  in bfgs at /pathtojulia/.julia/v0.3/Optim/src/bfgs.jl:121
>>>>  in optimize at /pathtojulia/.julia/v0.3/Optim/src/optimize.jl:113
>>>> while loading /pathtocode/code.jl, in expression starting on line 229
>>>> 
>>>> I've seen this error message before, and its usually because I have a bug 
>>>> in my code that erroneously generates function values or gradients which 
>>>> are very large (i.e., 1e100).  However, in this case I can confirm that 
>>>> the "x" I've passed to the optimizer is totally reasonable (abs value of 
>>>> all points less than 100), the function value at that x is reasonable (on 
>>>> the order of 1e6), the gradients are  reasonable (between -100 and +100), 
>>>> and the entries in the approximate inverse Hessian are also reasonable 
>>>> (smallest abs value is about 1e-9, largest is about 7).  
>>>> 
>>>> This isn't a failure on the first or second iteration of BFGS - it happens 
>>>> on the 34th iteration.
>>>> 
>>>> Unfortunately its pretty hard for me to share my code or data at the 
>>>> moment, so I understand that it might be challenging to solve this problem 
>>>> but any advice you guys can offer is appreciated!
>>>> 
>>>> -Thom
>>> 
>> 
>

Re: [julia-users] Optim.jl line search problems

Reply via email to