Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Konstantin Berlin Fri, 30 Nov 2012 16:42:44 -0800

Hi,

Now that I have some time, let me try to make my case clearly. First I want to 
say that this is not some attack on the automatic-differentation package. I 
love automatic-differentation and symbolic packages. I personally cannot 
compute a derivative without a computer for the life of me. And I am also glad 
we are finally starting to agree on something :)

This discussion is about figuring out how an incorrect framework and storage 
affects the performance of an optimization. That is why I am so worried.

So lets start with the basics. A Newton method must compute a descent step by 
solving a linear equation

H*p = -g, (1)

where H is the Hessian, g is the gradient, and p is the desired descent step. 
What I am about to say also holds for non-linear least squares method, where 
Hessian is approximated using the Jacobian as

H \approx J^T*J+\lambda I.

Now, if you are not solving a simple optimization problem that you keep giving 
examples for, you can easily have a Hessian be a very large matrix,
like 1000x1000 or larger. Now, you better hope that you are storing H using 
your linear algebra framework, otherwise eq. (1) computation is going to take a 
while.
This is actually a very active area of research, and that is why having sparse 
linear algebra (aren't you removing this? ;) ) and iterative solvers is 
important to a lot of people.

What you are proposing as good OO style is that if someone has a function that 
they want to optimize, they need to take what is probably already a RealMatrix 
or [][], and create 1000x1000 DerivativeStructure objects, that,
within the next 10 lines in the optimizer, are going to be converted back to a 
RealMatrix? Not only have you just fragmented your heap space, but your garbage 
collector
is going crazy, and you are wasting an incredible amount of memory. Now imagine 
if your Jacobian is actually very simple to compute but large, then this whole 
conversion back and forth actually takes much longer than the function 
evaluation.

So, why exactly are you insisting on taking this performance penalty? You claim 
that the optimizer can work better if it gets a DerivativeStructure, why? Where 
is the fact that you are holding a DerivativeStructure come into effect for a 
Newton method? Can you provide any literature in that regard? The classical 
optimization bottleneck, if not the actual function evaluation, is eq. (1). The 
optimization methodology is build on top of years of research in computational 
linear algebra concepts, and does not care how the matrices are actually 
computed. Efficient memory usage and speed is critical here because in Newton 
optimizations eq. (1) is evaluated thousands of times. The goal of the Jacobian 
is not to store derivatives it is to store a matrix of numbers that allows you 
to quadratically approximate your target function.

You guys are focusing on people using finite differences and trying to optimize 
a newton method to use finite differences. This is not what the main purpose of 
a newton method is. If you want a method that dynamically adjusts finite 
differences step size, you should switch to BOBYQA, which was designed for this 
case.

Derivatives can be computed by a computer using a symbolic toolbox a priori 
(something that I was referring to when I accidentally said 
auto-differenation). They can also be efficiently simplified by that toolbox 
before being pasted back into you program. Auto-diff could also be an important 
tool for people if their problem is simple or they are not concerned with 
optimal efficiency . This can easily be handled by a wrapper and has nothing to 
do with Newton optimization. If you want to talk about proper OO design and 
internal simplification then you should focus on being able to pass a linear 
solver into your optimization method. This way you allow people to use 
iterative and sparse solvers when needed. If you are concerned about people 
getting derivatives wrong, you can do what MATLAB does, which is to add an 
option to check the derivatives using finite differences when debugging. 

This brings me to my second issue. There are important problems where 
computation of the derivatives combined with the actual function evaluation is 
*significantly* faster than computing each alone. I am not clear why I am 
hitting such resistance with this. For example, you are doing some sort of a 
simulation, which can be trivially adjusted in the end to give a derivative or 
a function value. A very real example of this is a Fast Multipole Method, which 
takes time to compute a series expansion of the function, but the resulting 
series expansion can be analytically differentiated.

What I suggested as function interfaces was just an initial quick suggestion 
that I would be happy for anyone to change in a way that everyone feels 
positive about. I just don't want my second issue to be ignored like a 
non-issue.

Thanks for your time,
Konstantin

On Nov 30, 2012, at 5:15 PM, Gilles Sadowski <[email protected]> 
wrote:

>>> 
>>> I don't know if people are confused about auto-differentation, I
>>> think most people working in numerical analysis are very well aware
>>> of what it does. The issue here is that it is a completely separate
>>> subject from optimizations. In a proper OO design you would not mix
>>> the two together. Computation of the derivatives is a separate
>>> component from optimization. Optimization only assumes that you can
>>> compute one, it shouldn't care or force you two compute one in a
>>> specific way. That's bad design. Everyone component should only have
>>> necessary and sufficient requirements for use. Automatic
>>> differentiation is not necessary for optimization so it should not
>>> exist in the interface.
>> 
>> You are right.
>> 
>>> 
>>> I would like to add, that if my function is simple enough, I am very
>>> capable of running autdiff package a priori myself. The real need for
>>> finite-difference is when you cannot analytically compute the
>>> derivative. Though I think having a wrapper function around autdiff
>>> package is a nice tool to help developers quickly write code, if they
>>> prefer to have quicker development time but slightly slower code.
>> 
>>> 
>>> If someone can explain to me why auto-diff package should be directly
>>> forced or even be provided as alternative into optimization package
>>> when a simple wrapper function can do this task while maintaining
>>> much cleaner OO design, please tell me.
>> 
>> You just found the reason and explained it by yourself: it was bad
>> design and I am the one to blame for this. I was (and still am) stupid.
> 
> I'm not convinced that it was bad design. Maybe I'm completely off now:
> I thought that the purpose was internal simplification. I don't think
> that it can be inherently wrong to store any sort of derivatives (be it
> gradient or Jacobian) into a structure aimed at storing... partial
> derivatives (i.e. "DerivativeStructure").
> 
> Please let me know whether the problem is more in-depth than what I
> described previously (allowing the current way to provide the gradient
> and Jacobian).
> 
> I would nevertheless have suggested to delete the method
>    MultivariateFunction partialDerivative(int k);
> from "DifferentiableMultivariateFunction".
> 
>> 
>> So I suggest we disconnect differentiation from optimization, but in a
>> way that would let users decide how they provide the differentials. This
>> means I would not like to reintroduce the former interfaces.
>> 
>> What about having the optimize() methods taking two arguments function,
>> a MultivariateFunction for the value and a MultivariateVectorFunction
>> for the gradient? It would be up to the user to select how they compute
>> each function. They could use direct coding, they could use a finite
>> differences wrapper around the first function to implement the second,
> 
> 
> 
>> they could use our differentiation framework and put one wrapper to
>> extract the value and another one to extract the gradient.
> 
> That would also be a nice example of usage of the differentiation
> framework.
> 
>> What do you think?
> 
> Yes. Now I got it (I think). IIUC, we could even introduce the new interface
> before 3.1, and deprecated old the older ones (as you wanted to do anyways,
> IIRC).
> 
> I'll give it a try in the next days. OK?
> 
> 
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Reply via email to