Hi Francois, Welcome

> For reference I need about 100k playouts with
> RAVE to get 50% winrate against GnuGo 3.8 L10.

Yes that's more or less expected. At least before the
"big" improvements (yet to come ;-)

In my case I do a lot of testing at 4x10000 because
the games are around 15 seconds long and I get fast
Elo confidence intervals. At that rate 40 K plyo/move
I get about 40-42% of wins against gnugo. This is more
or less consistent with a debugged barebones without
particular smarts (but with RAVE, without progressive
widening). I guess 40% at 40000 scales to 50% near
100K but the exact point where I reach 50% has not
been studied as I expect it to be much lower in the
near future. (Optimistic)

> The next step is obviously to apply these to the
> playouts. I am currently testing my program with
> the ELO features in the playouts, but unfortunately
> the preliminary results don't look good.

That's exactly my experience! Although you do get
improvement with extend from atari/capture/distance to
prev heuristic.

The Mogo and CrazyStone papers report improvement
"all features included" which is true because the
other ideas produce improvement, but they don't
give results for the patterns in isolation.

I got a lot of improvement from Rémi's Bradley-Terry
ideas in move prediction (although with some
overlearning which I didn't care much about as
predicting moves is not my interest.) But neither
the naif values (times played/times seen) nor the
improved Bradley-Terry values are better in playouts
than uniform random. They are 158 CI(114..202) Elo
points worse!

That is good and bad news. Why should uniform random
be the best?. Obviously it is not. But what humans
play lacks all the information about what they don't
play because it is obvious to them, but it is not
obvious to a "silly" playout policy.

How to find good values for the patterns? (What I have
tried.)

a. Use small patterns (3x3) with all non-ill-formed
patterns in the database. (Other databases have a value
for "unknown" this one shouldn't.)

b. Classify patterns. I have done that in 40 classes.
This way you reduce the amount of degrees of liberty.
So your vector of gamma values is in R^40

c. Then what? I really don't know. I have a "sort of
genetic algorithm". I like the idea that anything
changes at random, because the gammas are not
independent and this way the expected value of the
correlation is zero even under stochastic dependence.
Then I select the "best winners" and move my center of
gravity one little step in one or two classes of patterns
repeat the entire process. Then test to see if there
was improvement. A long process. I only won a little
in the first iterations. After tat fake improvement
that wasn't verified against uniform random.

In all about 100 Elo points, less improvement than
the "humans patterns" do wrong. I guess best playouts
are a research area where there is room for improvement.


Jacques.



_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to