Assuming Peter is using standard terminology, automatic
differentiation does *not* mean numerical differentiation with finite
differences.  Automatic differentiation means calculating the exact
derivative analytically, just using a computer program rather than
doing it by hand: an AD differentiaties your source code
symbolically.
Yes, that is precisely what I do.

There is a theorem that you can always compute the gradient of f(x)
(any function from R^n to R) with about as many additional operations
as are required to compute f(x) **once**. The technique for doing so
is called an "adjoint method", or "reverse accumulation".
I do use a library which allows for reverse accumulation (namely CppAD),
and I think it is quite efficient. In particular, it usually requires only a constant factor of additional computational time compared to the
function evaluation. This factor is roughly equal to 10 in my case.
The overhead is due to the fact that all arithmetic operations have to
be recorded to a 'tape' first, Taylor coefficients are computed and
propagated, and then an additional reverse sweep is performed. By
design, the function has to be evaluated before any differentiation can
be done.

Furthermore, CppAD works by overloading functions with a different data type, not by code generation. This is essential, since my functions contain external templated libraries relying on (large amounts of) conditional expressions. Thus, the function I differentiate actually varies with the parameters. Differentiating once and for all, which is usually the goal of AD, does not work.

So, while a factor of 10 is great compared to the factor of 50 or even 100 (for 50 variables) you would get with finite differences, it still means that most of the time is spent for the gradient.

why are you using automatic
differentiation instead of using a gradient free algorithm? Are the
gradient free algorithims in NLOPT really so poor for your
application that automatic differentiation + a with-gradient
algorithm is superior?
Actually, for most mathematical applications, there is a huge
performance difference between gradient free algorithms and gradient
based methods. If it is possible to obtain a gradient by any means (sometimes even finite differences), the gradient based methods should easily beat other methods. This is particularly true if you have more than, say, 5 variables, even with the overhead for differentiation. For 50 variables in a non-linear problem, gradient-free methods will perhaps never converge.

In contrast, AD programs that use "forward mode" or "forward
accumulation" differentiation are poor for differentiating
multi-variate functions (R^n to R), since they have cost that grows
proportional to n. (Forward accumulation is good for differentiating
functions from R to R^m.)
Exactly ;-).


Best regards,
Peter

_______________________________________________
NLopt-discuss mailing list
[email protected]
http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/nlopt-discuss

Reply via email to