James, thank you, I will certainly give it a go in the next couple of
days, and will let you know how it works out - most of the minimization
problems in my field (theoretical nuclear physics) are very sensitive to
some parameters, and very insensitive to others - and that can "switch
around" at some points in parameter space - some parameters that "used
to be important" become next to irrelevant in that region, while the
others pick up in strength. Thank you very much for the link, and, as
mentioned, I will update you on the results!
James Bergstra wrote:
Two things:
1.
Hmm, from what I read on the conjugate gradient algorithms, it's
supposed to [basically] be a non-orthogonal version of the steepest
descent. But that picks the direction just based on the current
gradient, which is usually bad. Conjugate gradient picks the direction
based on the current gradient AND the previous gradient. Which means it
CG algorithms work by repeated line-minimization in R^n. The minimizer picks a
point, your function comes back with the gradient at that point, and then the
minimizer finds a step-size that minimizes the value of your function along that
line. If your function is more nonlinear in some directions than in others,
then this algorithm might not work very well.
2.
However, I never heard it only varies one parameter at a time - if
that is the case, and is the reason for the plateaus, I really need to
go searching for a better minimization routine, as this would not be
suitable with so many parameters, which, yes, have very different
influence on the overall chi squared.
If this is the case with your function (or even if it is not) I'd appreciate it
if you tried out the minimization routine called
msl_multimin_fdfminimizer_deltabardelta
that is implemented in my (recently-cleaned) libmsl
http://savannah.nongnu.org/projects/libmsl
I've only ever used it on a single type of minimization (neural network gradient
descent) and I'd appreciate it if you let me know if it works well on your
function too. It is designed to simultaneously optimize many dimensions in
which the non-linearity in different dimensions are very different. It does
this by continually estimating the diagonal of the Hessian, to make more
efficient use of the gradient.
On Tue, Nov 29, 2005 at 10:47:17PM +0100, Max Belushkin wrote:
"knows" about the first step direction at any step, which, I guess, is
why it's good to restart it from time to time.
Søren Christensen wrote:
Hello Max,
To the part two of your question.
I am no matematician either, but could it be that when you see the
mentioned plateau it is not actually a plateau, but one (or more) of
your parameters has a very small influence on the RSQ? Eg several order
of magnitudes smaller than others? 19.9809->....<keep basically idle
for some 100s of iterations>->19.97->19.5->
is really
19.9809->....<19.9809000040 -> 19.9809000036 -> 19.9809000003 .....
->19.97->19.5->
I am not too much into the details of the minimizing, but i seem to
recall it minimizes over one parameter at a time.
Is that a possibility?
Cheers
Soren
Fra: Max Belushkin
Sendt: on 30-11-2005 06:50
Til: [email protected]
Emne: [Help-gsl] On conjugate gradient algorithms in multidimentional
minimisation problems.
Hello list,
I apologize in advance for this rather long email, it is a collective
description of the common problem I have failed to gain an understanding
of, having tried for many months now.
Over the last couple of years, I have made some observations on
conjugate gradient algorithms that I would like to seek some advice
about. This experience is related to gsl-1.6 and gsl-1.7, as of late.
-- Part 1 --
The typical problem at hand is a parameter space of 20 dimensions or
more. That is, a chi-squared function is built, which is to be
minimized, that depends on 20 or more independent parameters passed to
the minimization routine.
The most common problem is the minimizer eventually bailing out with
an error of "iterations not progressing toward a solution" (the exact
message may be a bit different, I do not remember off-hand). Thus, I
took up to restarting the minimizer with
gsl_multimin_fdfminimizer_restart. However, it still does not help. Here
is what happens. If we take a toy sample from the examples, and modify
it a little, we get:
do {
tryagain:
iter++;
status = gsl_multimin_fdfminimizer_iterate(s);
if (status) {printf ("Restarting minimizer point 1\n");
gsl_fdfminimizer_restart(s); goto tryagain;};
status = gsl_multimin_test_gradient (s->gradient, 1e-3);
if (status == GSL_SUCCESS) { printf ("converged to minimum\n"); break;};
if (status!=GSL_CONTINUE) { gsl_fdfminimizer_restart(s); printf
("Restarted minimizer point 2\n"); status = GSL_CONTINUE;};
} while (status == GSL_CONTINUE && iter < 10000);
what happens is that we will always sit on "Restarting minimizer point
1", and status will always be "iterations not progressing toward a
solution". Removing the goto has the same effect, only I [without any
proof, just a feeling] believe it should also ruin the calls to
test_gradient in some way.
Now, if we remove the restart at point 1 and break there, then take
the parameter vector we ended up with, plug it in, and start the program
all over again, it will run for another 5000 iterations or so, then bail
out with the same error at the same point. Repeat. And repeat...
At this point, I was somehow thinking it may be an issue with my
parameter space. However, recently, my colleague, who has some code in
Fortran, asked me to come up with a C library employing GSL routines
that would minimize his problem, which is completely different from
mine, but the number of parameters is also large, around 15. So I came
up with a module that could be linked with a Fortran program, and the
exact same phenomenon appears there.
I must admit I am no huge expert in the mathematics underlying the
methods (Fletcher-Reeves, Polak-Ribiere and
Broyden-Fletcher-Goldfarb-Shanno give the exact same results, up to an
arbitrary constant in terms of the number of iterations), but I do know
they have to be restarted from time to time. I did things like "if (iter
% 50) gsl_fdfminimizer_restart(s)". No effect. So why does restarting
the iteration all over (restarting the program) from that point works,
and sometimes reduces the chi squared quite significantly over several
such runs? And is there any way to force the minimizer to keep going, or
have any more control over the convergence process?
-- Part 2 --
This is somewhat related to Part 1. If I look at the convergence by
the value chi squared takes per iteration, a typical [example] picture
looks like this:
50->30->20.1->20.0->19.99->19.981->19.9809->....<keep basically idle
for some 100s of iterations>->19.97->19.5->19.0->18.3->
17.8->17.79->....<hang here for another century>...->...
The iteration process shows huge plateaus. What can this be an
indication of? This is also very common and is seen on a wide range of
problems, thus I am assuming this is a well known fact. Perhaps someone
has a solution of speeding up the iteration and not sitting on those
plateaus where nothing seems to be happening much?
I apologize once again for the long email, and would be most grateful
for any help on this subject.
Thanks!
_______________________________________________
Help-gsl mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-gsl
_______________________________________________
Help-gsl mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-gsl
_______________________________________________
Help-gsl mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-gsl