Re: [julia-users] Re: Higher order derivatives in Calculus

Stefan Karpinski Wed, 22 Jan 2014 04:59:04 -0800

Mostly it's a matter of whether it makes sense to have it as a component of
a complex number. Actual mathematical realness isn't that significant.



On Wed, Jan 22, 2014 at 7:48 AM, Toivo Henningsson <[email protected]>wrote:

> I wonder what the assumptions on a type that is <: Real should be. If it
> should correspond to a subset of the real line, then I guess that dual
> numbers are out.
> But maybe it should be something looser, such as that it fulfills real(x)
> === x? Or maybe there should be a common supertype of Real and Dual,
> whatever it might be called and stand for, that should be used instead of
> Real in many places?
>
>
> On Wednesday, 22 January 2014 13:25:38 UTC+1, Stefan Karpinski wrote:
>
>> I wonder if Dual shouldn't be a subtype of Real. Of course, you can
>> probably have Dual complex numbers as well, but maybe those can be
>> Complex{Dual} in that case.
>>
>>
>>  On Wed, Jan 22, 2014 at 7:07 AM, Mauro <[email protected]> wrote:
>>
>>>  Dual is not a subtype of Real so there is no method when calling with
>>> Dual.  You need to define the function like so:
>>>
>>> function lambert_W(x::Number)]
>>> ...
>>> end
>>>
>>> then:
>>>
>>> julia> lambert_W(Dual(e, 1.0))
>>> 1.0 + 0.18393972058572117du
>>>
>>> Mauro
>>>
>>> On Wed, 2014-01-22 at 11:30, Hans W Borchers wrote:
>>> > Jason, very nice, and I will certainly look at DualNumbers.jl. But I
>>> have
>>> > doubts about your approach to the Lambert W function.
>>> >
>>> > See, lambert_newton(-1/e) returns -0.9999999434202022 and should
>>> return -1.
>>> > Below that, the real branch of Lambert W is not defined, but
>>> > lambert_newton(-0.5) returns 0.9706698296525587 though we are in the
>>> > imaginary branch.
>>> >
>>> > This is not so problematic as you could return NaN for those values.
>>> What
>>> > about parameters x > 20, for example
>>> >
>>> >     julia> lambert_newton(25)
>>> >     3.1737928264162476
>>> >
>>> > while the true value is 2.36015... And lambert_newton(1000) returns
>>> NaN,
>>> > true value 5.249603... That is why CASs such as Maple define the
>>> Lambert W
>>> > function through a Newton-Raphson procedure such as
>>> >
>>> >     function lambert_W(x::Real)
>>> >         if x < -1/exp(1); return NaN; end
>>> >         w0 = 1.0
>>> >         w1 = w0 - (w0 * exp(w0) - x)/((w0 + 1) * exp(w0) -
>>> >             (w0 + 2) * (w0 * exp(w0) - x)/(2 * w0 + 2))
>>> >         n = 1
>>> >         while abs(w1 - w0) > 1e-15 && n <= 20
>>> >             w0 = w1
>>> >             w1 = w0 - (w0 * exp(w0) - x)/((w0 + 1) * exp(w0) -
>>> >                 (w0 + 2) * (w0 * exp(w0) - x)/(2 * w0 + 2))
>>> >             n += 1
>>> >         end
>>> >         return w1
>>> >     end
>>> >
>>> > Now the value of lambert_W(-1/e) = -0.9999999911436428 is slightly
>>> better
>>> > than before and for x >= 20 and much higher we get correct results.
>>> >
>>> > Assume lamnbert_W is defined through this repetitive procedure, will
>>> > DualNumbers.jl work for this function? I did not succeed:
>>> >
>>> >     julia> lambert_W(Dual(e, 1.0))
>>> >     ERROR: no method lambert_W(Dual{Float64},)
>>> >
>>> > By the way. Lambert's W function at 1.0 is Gauss' Omega constant: Both
>>> > lambert_W and lambert_newton return 0.5671432904097838 which is
>>> correct to
>>> > the last digit.
>>> >
>>> > Hans Werner
>>> >
>>> >
>>> > On Wednesday, January 22, 2014 9:22:37 AM UTC+1, Jason Merrill wrote:
>>> >>
>>> >> The examples you mention are definitely interesting ones. Thanks for
>>> >> mentioning them.
>>> >>
>>> >> On Tuesday, January 21, 2014 4:05:15 AM UTC-8, Hans W Borchers wrote:
>>> >>>
>>> >>> Think of a special function not (yet) implemented in Julia, like
>>> Lambert
>>> >>> W, or Struve, or ... So you define it yourself, e.g. by applying
>>> >>> Newton-Raphson. How would you symbolically differentiate that?
>>> >>>
>>> >>
>>> >> Here's Newton's method for the Lambert W function:
>>> >>
>>> >>   julia> function lambert_newton(x)
>>> >>         y = zero(x)
>>> >>         for i = 1:25
>>> >>           y = y - (y*exp(y)-x)/(exp(y)*(1+y))
>>> >>         end
>>> >>         return y
>>> >>        end
>>> >>
>>> >>   julia> lambert_newton(1.0*e)
>>> >>   1.0
>>> >>
>>> >> You can differentiate this function to full precision by passing a
>>> Dual as
>>> >> an argument:
>>> >>
>>> >>   julia> lambert_newton(Dual(e, 1.0))
>>> >>   1.0 + 0.18393972058572117du
>>> >>
>>> >> Compare to the exact value [1]
>>> >>
>>> >>   julia> 1/2e
>>> >>   0.18393972058572117
>>> >>
>>> >> You can take a second derivative using (the master branch of [2])
>>> >> PowerSeries by passing a 2nd order power series as an argument, and
>>> using
>>> >> polydir twice
>>> >>
>>> >>   julia> polyder(polyder(lambert_newton(series(e, 1.0, 0.0))))
>>> >>   -0.05075073121372976
>>> >>
>>> >> Compare to the exact value [3]
>>> >>
>>> >>   julia> -3/8e^2
>>> >>   -0.050750731213729756
>>> >>
>>> >> In this case, we're off by one bit.
>>> >>
>>> >> Or your application forces you to define a procedure with if-else
>>> >>> constructs and loops, how to differentiate? Even an advanced
>>> "automated
>>> >>> differentiation" approach will most often not be capable of solving
>>> this
>>> >>> for you.
>>> >>>
>>> >>> Or imagine having to minimize a function such as
>>> >>>
>>> >>> function repelling_balls(x::Array{Float64, 1})
>>> >>>   if length(x) != 45; error("Arg. 'x' must be a vector of length
>>> 45.");
>>> >>> end
>>> >>>   d = 2.0
>>> >>>   for i = 1:14, j = (i+1):15
>>> >>>       s = sqrt((x[i]-x[j])^2 + (x[i+15]-x[j+15])^2 +
>>> (x[i+30]-x[j+30])^2)
>>> >>>       if s < d; d = s; end
>>> >>>   end
>>> >>>   return -d
>>> >>> end
>>> >>>
>>> >>> that models 15 repelling balls/points in [0, 1]^3.
>>> >>>
>>> >>
>>> >> As for your second example, you can extract the gradient vector
>>> element by
>>> >> element by lifting one coordinate at a time to a dual number, i.e.
>>> replace
>>> >> x[i] by Dual(x[i], 1.0), and then examining the du part of the
>>> result. You
>>> >> just need to drop the type annotation in your definition.
>>> >>
>>> >>   julia> x = rand(45) + Dual(0.0, 0.0)
>>> >>   julia> grad = zeros(Float64, 45)
>>> >>   julia> for i = 1:45
>>> >>              xp = x
>>> >>              xp[i] += Dual(0.0, 1.0)
>>> >>              r = repelling_balls(xp)
>>> >>              grad[i] = r.du
>>> >>            end
>>> >>   julia> grad
>>> >>
>>> >> The components of grad are now probably accurate to within 1 bit. This
>>> >> example works out a little bit funny because the box size is small
>>> compared
>>> >> to the ball size, but you can adjust the parameters to fix it. In
>>> >> principle, you could extract the hessian by taking directional second
>>> >> derivatives with PowerSeries, but I haven't fully worked that out
>>> yet. That
>>> >> could be interesting to know the stiffness about the equilibrium
>>> point once
>>> >> you've found it.
>>> >>
>>> >>
>>> >>> As "minimax" problem this function is smooth except for a subset of
>>> lower
>>> >>> dimension. Some optimization procedures using gradients will be able
>>> to
>>> >>> circumvent this subset and find a (local) minimum. How will you
>>> >>> differentiate that (i.e., build a gradient) if not numerically; and
>>> if yes,
>>> >>> is it worth the effort?
>>> >>>
>>> >>
>>> >> In both of these examples, you pay nothing in effort or performance
>>> >> compared to finite difference differentiation, and you avoid losing
>>> half
>>> >> your precision. I think that's very exciting!
>>> >>
>>> >>
>>> >>> Consider that an optimization or root finding procedure is a
>>> (hopefully)
>>> >>> converging process that will not return an absolutely precise
>>> result. A
>>> >>> function itself may be defined through an optimization or root
>>> finding task
>>> >>> and whose derivative will have a technical meaning.
>>> >>>
>>> >>> In real-world applications, and tasks Matlab is made for, you do not
>>> care
>>> >>> for 15 digits, and gradients five, six, seven digits accurate are
>>> more than
>>> >>> enough to solve your problem. Normally, your functions are not
>>> defined for
>>> >>> eps() accuracy anyway.
>>> >>>
>>> >>
>>> >> I should not have downplayed the significance or utility of finite
>>> >> difference differentiation. It is obviously tremendously important,
>>> and is
>>> >> suitable for many applications. Julia should strive to implement the
>>> best
>>> >> versions possible. Currently, different packages are at different
>>> points of
>>> >> their development.
>>> >>
>>> >> That said, it seems like you're trying to have it both ways here, in
>>> the
>>> >> sense that you're upset that Calculus throws away 2-3 digits compared
>>> to a
>>> >> better finite difference method, but at the same time, you're
>>> downplaying
>>> >> the significance of getting the other 7 digits to get to full
>>> precision.
>>> >>
>>> >> In a specific application, it's often obvious how many digits are
>>> >> relevant, and that number is definitely often less than 15. But as
>>> package
>>> >> authors, we should strive to compute to full precision when that's
>>> possible
>>> >> without undue complexity or extreme performance tradeoffs. The
>>> problem is
>>> >> that if packages A, B, and C all decide that 8 digits is good enough
>>> for
>>> >> anyone and so throw away half the precision of floating point
>>> arguments,
>>> >> and I decide to take the output of package A as input to package B,
>>> and the
>>> >> output of B as input to C, suddenly I don't have 8 digits, I have 2.
>>> I'm
>>> >> sure you probably understand this point already, but it's something I
>>> >> didn't completely appreciate until recently.
>>> >>
>>> >>
>>> >>> When I came here, I thought Julia was meant for "technical
>>> computing" and
>>> >>> not so much for pure Math exercises. In scientific computing,
>>> calculating
>>> >>> numerical derivatives and gradients is essential for many
>>> applications.
>>> >>>
>>> >>>
>>> >> I hope it will be for both. There's a lot of amazing people working
>>> on it.
>>> >>
>>> >> [1] First derivative of Lambert W:
>>> >> http://www.wolframalpha.com/input/?i=LambertW'[e]
>>> >> [2] The release of PowerSeries that makes this work is this PR:
>>> >> https://github.com/JuliaLang/METADATA.jl/pull/535
>>> >> [3] Second derivative of Lambert W:
>>> >> http://www.wolframalpha.com/input/?i=LambertW''[e]
>>> >>
>>> >>
>>> >>> On Tuesday, January 21, 2014 3:28:28 AM UTC+1, Jason Merrill wrote:
>>> >>>>
>>> >>>> Would you be willing to share a problem that you're interested in
>>> that
>>> >>>> isn't amenable to symbolic differentiation, maybe on a gist or
>>> something?
>>> >>>> I've been looking for non-trivial problems to try to apply
>>> PowerSeries to.
>>> >>>>
>>> >>>> When I said numerical differentiation is "unstable", I meant
>>> something
>>> >>>> like "hard to do without losing several digits". For example, you
>>> point out
>>> >>>> that by choosing a different h, it's possible to improve over
>>> Calculus.jl's
>>> >>>> current behavior by 2-3 decimal digits, from 7e-7 to 2e-9 in
>>> absolute
>>> >>>> error. But the best precision you could hope for in a floating
>>> point answer
>>> >>>> to this problem is
>>> >>>>
>>> >>>> julia> eps(sin(1.0))
>>> >>>> 1.1102230246251565e-16
>>> >>>>
>>> >>>> so even with your improvements, you've lost a lot of precision. That
>>> >>>> might or might not be a problem depending on what you'd like to do
>>> next
>>> >>>> with the value you get out.
>>> >>>>
>>> >>>>
>>> >>>> On Monday, January 20, 2014 12:30:38 PM UTC-8, Hans W Borchers
>>> wrote:
>>> >>>>>
>>> >>>>> Numerical differentiation is by far not as unstable as you seem to
>>> >>>>> think.
>>> >>>>> And I have a long experience in using numerical derivatives for
>>> >>>>> optimization problems where you don't stop to look up symbolic
>>> derivatives
>>> >>>>> applying a CAS.
>>> >>>>> The function obviously was only an example.
>>> >>>>> For most of the functions I have used Julia's symbolic capabilities
>>> >>>>> will not be sufficient.
>>> >>>>>
>>> >>>>>
>>> >>>>> On Monday, January 20, 2014 9:07:02 PM UTC+1, Jason Merrill wrote:
>>> >>>>>>
>>> >>>>>> This implementation could certainly use some love, but finite
>>> >>>>>> difference differentiation is always unstable, and the situation
>>> gets worse
>>> >>>>>> as you take higher order derivatives.
>>> >>>>>>
>>> >>>>>> You might want to consider using the differentiate method to take
>>> your
>>> >>>>>> derivatives symbolically (if this works for the functions you're
>>> using), or
>>> >>>>>> have a look at the differentiation example in PowerSeries.jl. To
>>> do a
>>> >>>>>> symbolic second derivative, you can do, e.g.
>>> >>>>>>
>>> >>>>>> julia> using Calculus
>>> >>>>>> julia> @eval d2(x) = $(differentiate(differentiate(:(sin(x)))))
>>> >>>>>> julia> d2(1.0)
>>> >>>>>> -0.8414709848078965
>>> >>>>>>
>>> >>>>>
>>>
>>> --
>>> --
>>> Mauro A Werder
>>> Bristol Glaciology Centre
>>> School of Geographical Sciences
>>> University of Bristol, UK
>>>
>>> Telephone: +44 (0) 117 3314154
>>> E-Mail:  [email protected]
>>> http://maurow.bitbucket.org
>>>
>>>
>>

Re: [julia-users] Re: Higher order derivatives in Calculus

Reply via email to