Re: [computer-go] Monte Carlo and refutation

Peter Drake Mon, 30 Oct 2006 22:25:17 -0800

On Oct 30, 2006, at 7:51 PM, Don Dailey wrote:

On Mon, 2006-10-30 at 19:34 -0800, Peter Drake wrote:

I'm running into a problem where my Monte Carlo program is very slow
to acknowledge that its favorite move has a strong counter.
Part of the problem is that the value of a move is based on how many
of the runs through that move have succeeded. If there were a lot of
them before the correct reply was discovered, the move has
considerable "inertia" and it takes time to recover.


The inertia effect is minor, once it finds the right move it tends to
quickly recover.

I'm getting a rather severe effect. Specifically, if the move inquestion has led to wins for hundreds of thousands of games, it takesa similar amount of time to get its win rate back down.


Has anyone else had experience with this?

The reason this works so well is that you are taking very largesamples,
100 samples wouldn't begin to be enough.


Just a number I pulled out of my hat.

A rationale (rationalization?), though, would be that since the mostrecent games were played with the most information, they're the mostreliable.

If you want to experiment, the best way to solve this problem in my
opinion is to emphasize the later moves more than the early movesbut ina very smooth way. The way I did this was to hit each move Iencounter
as well as it's siblings, with a "degrade factor".   I multiply the
scores and the moves by the same constant. Something very close to1.0
such as 0.99999.

You have to use floating point values to do this.  Consider how this
works - if you play 100 games and win half of them,  with my method it
matters which ones you win. With the normal method it doesn'tmatter.
With my method you will score much higher if you win more of the games
towards the end. Conversely, if this move is refuted and you losemost
of the later games, your score will drop much more quickly.

That sounds reasonable. Another thought is to keep a score for eachnode that is incremented when the player wins and decremented whenthe player loses, but within some bounds (say, +/- 100). The only wayto have a high positive score is to have won many recent games.


Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/

Has anyone tried only counting the most recent games through a move
(say, the last 100), rather than the entire history?

Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/



_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte Carlo and refutation

Reply via email to