Re: [julia-users] Optim.jl line search problems

Thomas Covert Wed, 20 Aug 2014 07:38:41 -0700

sorry if I was unclear - what I meant to say is that a single call to the 
objective function and its gradients represents considerably more 
computational work than what goes in inside the optimizer.


I will see if L-BFGS does a better job later today.  Thanks for your help.

-thom

On Wednesday, August 20, 2014 9:33:21 AM UTC-5, John Myles White wrote:
>
> It seems odd that your objective function takes less time than the 
> optimization routine itself, unless you include the calls to your objective 
> function in the cost of the optimization routine. The optimization routine 
> does very little work: most of the line search’s cost is induced by 
> repeatedly calling calling the objective function at trial points to decide 
> a stepsize.
>
> The search path for L-BFGS is likely to be somewhat different from BFGS 
> because it uses a different approximation to the Hessian. Whether that 
> difference is substantive is problem-dependent.
>
> — John
>
> On Aug 20, 2014, at 7:23 AM, Thomas Covert <[email protected] 
> <javascript:>> wrote:
>
> My (current) objective function has about 30 parameters, so N^2 complexity 
> isn't a problem (storage-wise or matrix multiplication time wise).  Also, 
> for my current work, the objective function is much slower than the 
> optimization routine itself, so the overhead of a full inverse Hessian is 
> relatively small.
>
> In Optim.jl, L-BFGS seems to use the same line search routine as BFGS.  Is 
> there a reason to think it should take substantively different search path?
>
>
> -thom
>
> On Wednesday, August 20, 2014 9:18:29 AM UTC-5, John Myles White wrote:
>>
>> I don’t think I’m going to have time to look into this soon, but why do 
>> you use BFGS? In my experience L-BFGS is almost always better.
>>
>> Of course, we want our BFGS code to be better. But I use BFGS only quite 
>> rarely because of its O(N^2) complexity.
>>
>>  — John
>>
>> On Aug 20, 2014, at 7:16 AM, Thomas Covert <[email protected]> wrote:
>>
>> Ok after reading the paper which the hz_linesearch! routine is based on, 
>> I can see that I'm wrong about this.  Still puzzled, but definitely wrong!
>>
>> On Tuesday, August 19, 2014 1:51:37 PM UTC-5, Thomas Covert wrote:
>>>
>>> I'm seeing this same error (ERROR: assertion failed: lsr.slope[ib] < 0) 
>>> again, and this time my gradients (evaluated at "reasonable" input values) 
>>> match the finite difference output generated by Calculus.jl's "gradient" 
>>> function.  The function I am trying to minize is globally convex (its a 
>>> multinomial logit log-likelihood).
>>>
>>> I encounter this assertion error after a few successful iterations of 
>>> BFGS and it is caused by NAN's in the gradient of the test point.  BFGS 
>>> gets to this
>>> test point because the step size it passes to hz_linesearch eventually 
>>> gets to be large, and a big enough step can cause floating point errors in 
>>> the calculation of the the derivatives.  For example, on a recent 
>>> minimization attempt, the assertion error happens when "c" (the step size 
>>> passed by bfgs to hz_linesearch) appears to be about 380.
>>>
>>> I think this is happening because hz_linesearch (a) expands the step 
>>> size by a factor of 5 (see line 280 in hz_linesearch) until it encounters 
>>> upward movement and (b) passes this new value (or a moving average of it) 
>>> back to the caller (i.e., bfgs).  So, the next time bfgs calls 
>>> hz_linesearch, it starts out with a potentially large value for the 
>>> first step.
>>>
>>> I don't really know much about line search routines, but is this way 
>>> things ought to be?  I would have thought that for each new call to a line 
>>> search routine, the step size should reset to a default value.
>>>
>>> By the way, is it possible to enable display of the internal values of 
>>> "c" in the line search routines?  It looks like there is some debugging 
>>> code in there but I'm not sure how to turn it on.
>>>
>>> -thom
>>>
>>>
>>> On Wednesday, July 30, 2014 6:24:26 PM UTC-5, John Myles White wrote:
>>>>
>>>> I’ve never seen our line search methods produce an error that wasn’t 
>>>> caused by errors in the gradient. The line search methods generally only 
>>>> work with function values and gradients, so they’re either buggy (which 
>>>> they haven’t proven to be) or they’re brittle to errors in function 
>>>> definitions/gradient definitions.
>>>>
>>>> Producing better error message would be great. I once started to do 
>>>> that, but realized that I needed to come back to fully understanding the 
>>>> line search code before I could insert useful errors. Would love to see 
>>>> improvements there.
>>>>
>>>>  — John
>>>>
>>>> On Jul 30, 2014, at 3:17 PM, Thomas Covert <[email protected]> wrote:
>>>>
>>>> I've done some more sleuthing and have concluded that the problem was 
>>>> on my end (a bug in the gradient calculation, as you predicted). 
>>>>
>>>> Is an inaccurate gradient the only way someone should encounter this 
>>>> assertion error?  I don't know enough about line search methods to have an 
>>>> intuition about that, but if it is the case, maybe the line search routine 
>>>> should throw a more informative error?
>>>>
>>>> -Thom
>>>>
>>>> On Wednesday, July 30, 2014 3:44:51 PM UTC-5, John Myles White wrote:
>>>>>
>>>>> Would be useful to understand exactly what goes wrong if we want to 
>>>>> fix this problem. I’m mostly used to errors caused by inaccurate 
>>>>> gradients, 
>>>>> so I don’t have an intuition for the cause of this problem.
>>>>>  
>>>>> — John
>>>>>
>>>>> On Jul 30, 2014, at 10:45 AM, Thomas Covert <[email protected]> 
>>>>> wrote:
>>>>>
>>>>> No, I haven't tried that yet - might someday, but I like the idea of 
>>>>> running julia native code all the way...  
>>>>>
>>>>> However, I did find that manually switching the line search routine to 
>>>>> "backtracking_linesearch!" did the trick, so at least we know the problem 
>>>>> isn't in Optim.jl's implementation of BFGS itself!
>>>>>
>>>>> -thom
>>>>>
>>>>> On Wednesday, July 30, 2014 12:43:16 PM UTC-5, jbeginner wrote:
>>>>>>
>>>>>> This is not really a solution for this problem but have you tried the 
>>>>>> NLopt library? From my experience it produces much more stable results 
>>>>>> and 
>>>>>> because of problems like the one you describe I have switched to it. I 
>>>>>> think there is an L-BFGS option also. Although I did not get AD to work 
>>>>>> with it. The description for all algorithms can be seen here:
>>>>>>
>>>>>> http://ab-initio.mit.edu/wiki/index.php/NLopt_Algorithms
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wednesday, July 30, 2014 12:27:36 PM UTC-4, Thomas Covert wrote:
>>>>>>>
>>>>>>> Recently I've encountered line search errors when using Optim.jl 
>>>>>>> with BFGS.  Here is an example error message
>>>>>>>
>>>>>>> *ERROR: assertion failed: lsr.slope[ib] < 0*
>>>>>>>
>>>>>>> * in bisect! at 
>>>>>>> /pathtojulia/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:577*
>>>>>>>
>>>>>>> * in hz_linesearch! at /**pathtojulia*
>>>>>>> */.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:273*
>>>>>>>
>>>>>>> * in hz_linesearch! at /**pathtojulia*
>>>>>>> */.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:201*
>>>>>>>
>>>>>>> * in bfgs at /**pathtojulia**/.julia/v0.3/Optim/src/bfgs.jl:121*
>>>>>>>
>>>>>>> * in optimize at /**pathtojulia*
>>>>>>> */.julia/v0.3/Optim/src/optimize.jl:113*
>>>>>>>
>>>>>>> *while loading /pathtocode/code.jl, in expression starting on line 
>>>>>>> 229*
>>>>>>>
>>>>>>>
>>>>>>> I've seen this error message before, and its usually because I have 
>>>>>>> a bug in my code that erroneously generates function values or 
>>>>>>> gradients 
>>>>>>> which are very large (i.e., 1e100).  However, in this case I can 
>>>>>>> confirm 
>>>>>>> that the "x" I've passed to the optimizer is totally reasonable (abs 
>>>>>>> value 
>>>>>>> of all points less than 100), the function value at that x is 
>>>>>>> reasonable 
>>>>>>> (on the order of 1e6), the gradients are  reasonable (between -100 and 
>>>>>>> +100), and the entries in the approximate inverse Hessian are also 
>>>>>>> reasonable (smallest abs value is about 1e-9, largest is about 7).  
>>>>>>>
>>>>>>>
>>>>>>> This isn't a failure on the first or second iteration of BFGS - it 
>>>>>>> happens on the 34th iteration.
>>>>>>>
>>>>>>>
>>>>>>> Unfortunately its pretty hard for me to share my code or data at the 
>>>>>>> moment, so I understand that it might be challenging to solve this 
>>>>>>> problem 
>>>>>>> but any advice you guys can offer is appreciated!
>>>>>>>
>>>>>>>
>>>>>>> -Thom
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Re: [julia-users] Optim.jl line search problems

Reply via email to