sorry if I was unclear - what I meant to say is that a single call to the objective function and its gradients represents considerably more computational work than what goes in inside the optimizer.
I will see if L-BFGS does a better job later today. Thanks for your help. -thom On Wednesday, August 20, 2014 9:33:21 AM UTC-5, John Myles White wrote: > > It seems odd that your objective function takes less time than the > optimization routine itself, unless you include the calls to your objective > function in the cost of the optimization routine. The optimization routine > does very little work: most of the line search’s cost is induced by > repeatedly calling calling the objective function at trial points to decide > a stepsize. > > The search path for L-BFGS is likely to be somewhat different from BFGS > because it uses a different approximation to the Hessian. Whether that > difference is substantive is problem-dependent. > > — John > > On Aug 20, 2014, at 7:23 AM, Thomas Covert <[email protected] > <javascript:>> wrote: > > My (current) objective function has about 30 parameters, so N^2 complexity > isn't a problem (storage-wise or matrix multiplication time wise). Also, > for my current work, the objective function is much slower than the > optimization routine itself, so the overhead of a full inverse Hessian is > relatively small. > > In Optim.jl, L-BFGS seems to use the same line search routine as BFGS. Is > there a reason to think it should take substantively different search path? > > > -thom > > On Wednesday, August 20, 2014 9:18:29 AM UTC-5, John Myles White wrote: >> >> I don’t think I’m going to have time to look into this soon, but why do >> you use BFGS? In my experience L-BFGS is almost always better. >> >> Of course, we want our BFGS code to be better. But I use BFGS only quite >> rarely because of its O(N^2) complexity. >> >> — John >> >> On Aug 20, 2014, at 7:16 AM, Thomas Covert <[email protected]> wrote: >> >> Ok after reading the paper which the hz_linesearch! routine is based on, >> I can see that I'm wrong about this. Still puzzled, but definitely wrong! >> >> On Tuesday, August 19, 2014 1:51:37 PM UTC-5, Thomas Covert wrote: >>> >>> I'm seeing this same error (ERROR: assertion failed: lsr.slope[ib] < 0) >>> again, and this time my gradients (evaluated at "reasonable" input values) >>> match the finite difference output generated by Calculus.jl's "gradient" >>> function. The function I am trying to minize is globally convex (its a >>> multinomial logit log-likelihood). >>> >>> I encounter this assertion error after a few successful iterations of >>> BFGS and it is caused by NAN's in the gradient of the test point. BFGS >>> gets to this >>> test point because the step size it passes to hz_linesearch eventually >>> gets to be large, and a big enough step can cause floating point errors in >>> the calculation of the the derivatives. For example, on a recent >>> minimization attempt, the assertion error happens when "c" (the step size >>> passed by bfgs to hz_linesearch) appears to be about 380. >>> >>> I think this is happening because hz_linesearch (a) expands the step >>> size by a factor of 5 (see line 280 in hz_linesearch) until it encounters >>> upward movement and (b) passes this new value (or a moving average of it) >>> back to the caller (i.e., bfgs). So, the next time bfgs calls >>> hz_linesearch, it starts out with a potentially large value for the >>> first step. >>> >>> I don't really know much about line search routines, but is this way >>> things ought to be? I would have thought that for each new call to a line >>> search routine, the step size should reset to a default value. >>> >>> By the way, is it possible to enable display of the internal values of >>> "c" in the line search routines? It looks like there is some debugging >>> code in there but I'm not sure how to turn it on. >>> >>> -thom >>> >>> >>> On Wednesday, July 30, 2014 6:24:26 PM UTC-5, John Myles White wrote: >>>> >>>> I’ve never seen our line search methods produce an error that wasn’t >>>> caused by errors in the gradient. The line search methods generally only >>>> work with function values and gradients, so they’re either buggy (which >>>> they haven’t proven to be) or they’re brittle to errors in function >>>> definitions/gradient definitions. >>>> >>>> Producing better error message would be great. I once started to do >>>> that, but realized that I needed to come back to fully understanding the >>>> line search code before I could insert useful errors. Would love to see >>>> improvements there. >>>> >>>> — John >>>> >>>> On Jul 30, 2014, at 3:17 PM, Thomas Covert <[email protected]> wrote: >>>> >>>> I've done some more sleuthing and have concluded that the problem was >>>> on my end (a bug in the gradient calculation, as you predicted). >>>> >>>> Is an inaccurate gradient the only way someone should encounter this >>>> assertion error? I don't know enough about line search methods to have an >>>> intuition about that, but if it is the case, maybe the line search routine >>>> should throw a more informative error? >>>> >>>> -Thom >>>> >>>> On Wednesday, July 30, 2014 3:44:51 PM UTC-5, John Myles White wrote: >>>>> >>>>> Would be useful to understand exactly what goes wrong if we want to >>>>> fix this problem. I’m mostly used to errors caused by inaccurate >>>>> gradients, >>>>> so I don’t have an intuition for the cause of this problem. >>>>> >>>>> — John >>>>> >>>>> On Jul 30, 2014, at 10:45 AM, Thomas Covert <[email protected]> >>>>> wrote: >>>>> >>>>> No, I haven't tried that yet - might someday, but I like the idea of >>>>> running julia native code all the way... >>>>> >>>>> However, I did find that manually switching the line search routine to >>>>> "backtracking_linesearch!" did the trick, so at least we know the problem >>>>> isn't in Optim.jl's implementation of BFGS itself! >>>>> >>>>> -thom >>>>> >>>>> On Wednesday, July 30, 2014 12:43:16 PM UTC-5, jbeginner wrote: >>>>>> >>>>>> This is not really a solution for this problem but have you tried the >>>>>> NLopt library? From my experience it produces much more stable results >>>>>> and >>>>>> because of problems like the one you describe I have switched to it. I >>>>>> think there is an L-BFGS option also. Although I did not get AD to work >>>>>> with it. The description for all algorithms can be seen here: >>>>>> >>>>>> http://ab-initio.mit.edu/wiki/index.php/NLopt_Algorithms >>>>>> >>>>>> >>>>>> >>>>>> On Wednesday, July 30, 2014 12:27:36 PM UTC-4, Thomas Covert wrote: >>>>>>> >>>>>>> Recently I've encountered line search errors when using Optim.jl >>>>>>> with BFGS. Here is an example error message >>>>>>> >>>>>>> *ERROR: assertion failed: lsr.slope[ib] < 0* >>>>>>> >>>>>>> * in bisect! at >>>>>>> /pathtojulia/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:577* >>>>>>> >>>>>>> * in hz_linesearch! at /**pathtojulia* >>>>>>> */.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:273* >>>>>>> >>>>>>> * in hz_linesearch! at /**pathtojulia* >>>>>>> */.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:201* >>>>>>> >>>>>>> * in bfgs at /**pathtojulia**/.julia/v0.3/Optim/src/bfgs.jl:121* >>>>>>> >>>>>>> * in optimize at /**pathtojulia* >>>>>>> */.julia/v0.3/Optim/src/optimize.jl:113* >>>>>>> >>>>>>> *while loading /pathtocode/code.jl, in expression starting on line >>>>>>> 229* >>>>>>> >>>>>>> >>>>>>> I've seen this error message before, and its usually because I have >>>>>>> a bug in my code that erroneously generates function values or >>>>>>> gradients >>>>>>> which are very large (i.e., 1e100). However, in this case I can >>>>>>> confirm >>>>>>> that the "x" I've passed to the optimizer is totally reasonable (abs >>>>>>> value >>>>>>> of all points less than 100), the function value at that x is >>>>>>> reasonable >>>>>>> (on the order of 1e6), the gradients are reasonable (between -100 and >>>>>>> +100), and the entries in the approximate inverse Hessian are also >>>>>>> reasonable (smallest abs value is about 1e-9, largest is about 7). >>>>>>> >>>>>>> >>>>>>> This isn't a failure on the first or second iteration of BFGS - it >>>>>>> happens on the 34th iteration. >>>>>>> >>>>>>> >>>>>>> Unfortunately its pretty hard for me to share my code or data at the >>>>>>> moment, so I understand that it might be challenging to solve this >>>>>>> problem >>>>>>> but any advice you guys can offer is appreciated! >>>>>>> >>>>>>> >>>>>>> -Thom >>>>>>> >>>>>> >>>>> >>>> >> >
