I've been playing with an alpha beta searcher to see if it's feasible
using play-outs as an evaluation function in a program that does an
alpha beta global search.

I'm using the heavy play-out version of Lazarus and doing tests of up
to 3 ply searches on 9x9.  The old 1500 ELO rated AnchorMan is a nice
control point for this study.

The various versions I'm testing are selective, I use a technique
similar to that used in modern chess programs of aborting the search
at a given node after only a few children nodes have been tried when
the expectation remains below alpha.  I have not run enough tests to
yet to know if this is reasonable in GO but since it has proved to
scale well, I am hopeful.  I'm currently running another test without
this enhancement to see if this is actually an improvement, but I do
not have enough data to draw any reasonable conclusions yet.

There are many potential tricks that can speed up the search greatly
and I've barely scratched the surface, but I have done a couple of
simple basic things that appear to be working very well.

Since the evaluation function is a set of play-outs, the interesting
question is how many play-outs should I use to get a reasonable
evaluation?  The more play-outs you use, the stronger and more stable
your evaluation function is.  However, you also decrease the depth of
the search tree.  There must be a well chosen balance because as you
increase the number of play-outs you get diminishing returns on the
quality.

The first thing I discovered is that heavy play-outs is a substantial
advantage when the number of play-outs is small.  This is no surprise.
It's not clear to me if heavy play-outs are better if you are doing a
huge number of them however.  This is something that should be tested.
But what is certainly clear is that if you are only doing a few
play-outs, heavy play-outs is far superior.

In order to find the "optimal" number of play-outs, I have done a
study of various versions of this alpha beta searcher doing 1, 2, and
3 ply searches.  There are 3 versions giving 9 combinations.  One
version does 64 play-outs, one does 128 play-outs and one does 256
play-outs.

What I am looking for is which version plays the strongest in the
least amount of time.  You cannot directly compare them by ELO because
they each have different run times and ELO ratings.  For instance, is
the program that plays 1700 ELO better than the program that plays
1600 ELO if it is a factor of 3 slower?

Questions like this get to the heart of the issue.  This is why I've
always argued that playing strength as measured by ELO numbers is
meaningless without taking time (or CPU power) into consideration.

A very simple way to get the answer is to plot the results as a graph.
So I plotted the ELO improvement for each version as it went from 1 to
2 to 3 ply searches.  In order to get a meaningful comparison I did
the following:

    1.  The X axis is log(t) where t is the average time per game.  

    2.  The Y axis is the ELO ratings of the programs.

When you plot this way, you tend to get very straight lines since ELO
improvement is nearly linear with the logarithm of time.

What I ended up with is 3 almost parallel lines.  The "top" line is a
great indicator of which version has the best time/performance
trade-off.

In this case, it turned out to be the version that does 128 play-outs.
Remember I tested 64, 128 and 256.  Of course the version that does
256 play-outs plays stronger at a given depth, but not enough to
justify the extra computing time taken.  The graph makes this clear
and obvious.

Unfortunately, I have other "knobs to turn" which could affect the
proper value.  I have a gamma value which controls how liberal I
should be about aborting the play-outs when it looks like there is no
chance of falling inside the alpha/beta window and this has a strong
effect on the speed of the program and quality of the evaluation.

It's interesting to compare this "new" program to AnchorMan.
AnchorMan is only 1 data point (at it's default level of 5000
play-outs) but by putting it on the graph it's easy to see that it's
intrinsically "stronger" in the sense that the new alpha beta searcher
cannot beat it using the same amount of time.  However, AnchorMan does
not scale, so in most practical time controls the alpha beta searcher
will prevail.

This new alpha beta search is playing on CGOS under the name "FooBar"
and appears to be around the 1800 level doing fixed depth 3 ply
searches.  It is a straightforward alpha beta searcher with only a few
enhancements.  It does not yet even use hash tables.

It spends on average about 2/3 of it's time allotment on CGOS.

So I think this approach has potential since I have barely began
experimenting with this and many more potential enhancements remain to
be tested.

- Don

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to