[julia-users] Re: Higher order derivatives in Calculus

Jason Merrill Wed, 22 Jan 2014 00:40:20 -0800

One more erratum, sorry... In the repelling balls example, the line that 
read xp = x should have read xp = copy(x).


On Wednesday, January 22, 2014 12:26:24 AM UTC-8, Jason Merrill wrote:
>
> Should have mentioned, to make these examples work, you need to start off 
> your session by doing adding DualNumbers and PowerSeries if you don't have 
> them already, and "using them":
>
>   julia> Pkg.add("DualNumbers")
>   julia> Pkg.add("PowerSeries")
>   julia> using DualNumbers
>   julia> using PowerSeries
>
> On Wednesday, January 22, 2014 12:22:37 AM UTC-8, Jason Merrill wrote:
>>
>> The examples you mention are definitely interesting ones. Thanks for 
>> mentioning them.
>>
>> On Tuesday, January 21, 2014 4:05:15 AM UTC-8, Hans W Borchers wrote:
>>>
>>> Think of a special function not (yet) implemented in Julia, like Lambert 
>>> W, or Struve, or ... So you define it yourself, e.g. by applying 
>>> Newton-Raphson. How would you symbolically differentiate that?
>>>
>>
>> Here's Newton's method for the Lambert W function:
>>
>>   julia> function lambert_newton(x)
>>         y = zero(x)
>>         for i = 1:25
>>           y = y - (y*exp(y)-x)/(exp(y)*(1+y))
>>         end
>>         return y
>>        end
>>
>>   julia> lambert_newton(1.0*e)
>>   1.0
>>
>> You can differentiate this function to full precision by passing a Dual 
>> as an argument:
>>
>>   julia> lambert_newton(Dual(e, 1.0))
>>   1.0 + 0.18393972058572117du
>>
>> Compare to the exact value [1]
>>   
>>   julia> 1/2e
>>   0.18393972058572117
>>
>> You can take a second derivative using (the master branch of [2]) 
>> PowerSeries by passing a 2nd order power series as an argument, and using 
>> polydir twice
>>
>>   julia> polyder(polyder(lambert_newton(series(e, 1.0, 0.0))))
>>   -0.05075073121372976
>>
>> Compare to the exact value [3]
>>
>>   julia> -3/8e^2
>>   -0.050750731213729756
>>
>> In this case, we're off by one bit.
>>
>> Or your application forces you to define a procedure with if-else 
>>> constructs and loops, how to differentiate? Even an advanced "automated 
>>> differentiation" approach will most often not be capable of solving this 
>>> for you.
>>>
>>> Or imagine having to minimize a function such as
>>>
>>> function repelling_balls(x::Array{Float64, 1})
>>>   if length(x) != 45; error("Arg. 'x' must be a vector of length 45."); 
>>> end
>>>   d = 2.0
>>>   for i = 1:14, j = (i+1):15
>>>       s = sqrt((x[i]-x[j])^2 + (x[i+15]-x[j+15])^2 + (x[i+30]-x[j+30])^2)
>>>       if s < d; d = s; end
>>>   end
>>>   return -d
>>> end
>>>
>>> that models 15 repelling balls/points in [0, 1]^3.
>>>
>>
>> As for your second example, you can extract the gradient vector element 
>> by element by lifting one coordinate at a time to a dual number, i.e. 
>> replace x[i] by Dual(x[i], 1.0), and then examining the du part of the 
>> result. You just need to drop the type annotation in your definition.
>>
>>   julia> x = rand(45) + Dual(0.0, 0.0)
>>   julia> grad = zeros(Float64, 45)
>>   julia> for i = 1:45
>>              xp = x
>>              xp[i] += Dual(0.0, 1.0)
>>              r = repelling_balls(xp)
>>              grad[i] = r.du
>>            end
>>   julia> grad
>>
>> The components of grad are now probably accurate to within 1 bit. This 
>> example works out a little bit funny because the box size is small compared 
>> to the ball size, but you can adjust the parameters to fix it. In 
>> principle, you could extract the hessian by taking directional second 
>> derivatives with PowerSeries, but I haven't fully worked that out yet. That 
>> could be interesting to know the stiffness about the equilibrium point once 
>> you've found it. 
>>  
>>
>>> As "minimax" problem this function is smooth except for a subset of 
>>> lower dimension. Some optimization procedures using gradients will be able 
>>> to circumvent this subset and find a (local) minimum. How will you 
>>> differentiate that (i.e., build a gradient) if not numerically; and if yes, 
>>> is it worth the effort?
>>>
>>
>> In both of these examples, you pay nothing in effort or performance 
>> compared to finite difference differentiation, and you avoid losing half 
>> your precision. I think that's very exciting!
>>  
>>
>>> Consider that an optimization or root finding procedure is a (hopefully) 
>>> converging process that will not return an absolutely precise result. A 
>>> function itself may be defined through an optimization or root finding task 
>>> and whose derivative will have a technical meaning.
>>>
>>> In real-world applications, and tasks Matlab is made for, you do not 
>>> care for 15 digits, and gradients five, six, seven digits accurate are more 
>>> than enough to solve your problem. Normally, your functions are not defined 
>>> for eps() accuracy anyway.
>>>
>>
>> I should not have downplayed the significance or utility of finite 
>> difference differentiation. It is obviously tremendously important, and is 
>> suitable for many applications. Julia should strive to implement the best 
>> versions possible. Currently, different packages are at different points of 
>> their development.
>>
>> That said, it seems like you're trying to have it both ways here, in the 
>> sense that you're upset that Calculus throws away 2-3 digits compared to a 
>> better finite difference method, but at the same time, you're downplaying 
>> the significance of getting the other 7 digits to get to full precision.
>>
>> In a specific application, it's often obvious how many digits are 
>> relevant, and that number is definitely often less than 15. But as package 
>> authors, we should strive to compute to full precision when that's possible 
>> without undue complexity or extreme performance tradeoffs. The problem is 
>> that if packages A, B, and C all decide that 8 digits is good enough for 
>> anyone and so throw away half the precision of floating point arguments, 
>> and I decide to take the output of package A as input to package B, and the 
>> output of B as input to C, suddenly I don't have 8 digits, I have 2. I'm 
>> sure you probably understand this point already, but it's something I 
>> didn't completely appreciate until recently.
>>  
>>
>>> When I came here, I thought Julia was meant for "technical computing" 
>>> and not so much for pure Math exercises. In scientific computing, 
>>> calculating numerical derivatives and gradients is essential for many 
>>> applications.
>>>
>>>
>> I hope it will be for both. There's a lot of amazing people working on it.
>>
>> [1] First derivative of Lambert W: 
>> http://www.wolframalpha.com/input/?i=LambertW'[e]
>> [2] The release of PowerSeries that makes this work is this PR: 
>> https://github.com/JuliaLang/METADATA.jl/pull/535
>> [3] Second derivative of Lambert W: 
>> http://www.wolframalpha.com/input/?i=LambertW''[e]
>>  
>>
>>> On Tuesday, January 21, 2014 3:28:28 AM UTC+1, Jason Merrill wrote:
>>>>
>>>> Would you be willing to share a problem that you're interested in that 
>>>> isn't amenable to symbolic differentiation, maybe on a gist or something? 
>>>> I've been looking for non-trivial problems to try to apply PowerSeries to.
>>>>
>>>> When I said numerical differentiation is "unstable", I meant something 
>>>> like "hard to do without losing several digits". For example, you point 
>>>> out 
>>>> that by choosing a different h, it's possible to improve over 
>>>> Calculus.jl's 
>>>> current behavior by 2-3 decimal digits, from 7e-7 to 2e-9 in absolute 
>>>> error. But the best precision you could hope for in a floating point 
>>>> answer 
>>>> to this problem is
>>>>
>>>> julia> eps(sin(1.0))
>>>> 1.1102230246251565e-16
>>>>
>>>> so even with your improvements, you've lost a lot of precision. That 
>>>> might or might not be a problem depending on what you'd like to do next 
>>>> with the value you get out.
>>>>
>>>>
>>>> On Monday, January 20, 2014 12:30:38 PM UTC-8, Hans W Borchers wrote:
>>>>>
>>>>> Numerical differentiation is by far not as unstable as you seem to 
>>>>> think.
>>>>> And I have a long experience in using numerical derivatives for 
>>>>> optimization problems where you don't stop to look up symbolic 
>>>>> derivatives 
>>>>> applying a CAS.
>>>>> The function obviously was only an example.
>>>>> For most of the functions I have used Julia's symbolic capabilities 
>>>>> will not be sufficient.
>>>>>
>>>>>
>>>>> On Monday, January 20, 2014 9:07:02 PM UTC+1, Jason Merrill wrote:
>>>>>>
>>>>>> This implementation could certainly use some love, but finite 
>>>>>> difference differentiation is always unstable, and the situation gets 
>>>>>> worse 
>>>>>> as you take higher order derivatives.
>>>>>>
>>>>>> You might want to consider using the differentiate method to take 
>>>>>> your derivatives symbolically (if this works for the functions you're 
>>>>>> using), or have a look at the differentiation example in PowerSeries.jl. 
>>>>>> To 
>>>>>> do a symbolic second derivative, you can do, e.g.
>>>>>>
>>>>>> julia> using Calculus
>>>>>> julia> @eval d2(x) = $(differentiate(differentiate(:(sin(x)))))
>>>>>> julia> d2(1.0)
>>>>>> -0.8414709848078965
>>>>>>
>>>>>

[julia-users] Re: Higher order derivatives in Calculus

Reply via email to