On Oct 30, 2006, at 7:51 PM, Don Dailey wrote:

On Mon, 2006-10-30 at 19:34 -0800, Peter Drake wrote:
I'm running into a problem where my Monte Carlo program is very slow
to acknowledge that its favorite move has a strong counter.
Part of the problem is that the value of a move is based on how many
of the runs through that move have succeeded. If there were a lot of
them before the correct reply was discovered, the move has
considerable "inertia" and it takes time to recover.

The inertia effect is minor, once it finds the right move it tends to
quickly recover.

I'm getting a rather severe effect. Specifically, if the move in question has led to wins for hundreds of thousands of games, it takes a similar amount of time to get its win rate back down.

Has anyone else had experience with this?

The reason this works so well is that you are taking very large samples,
100 samples wouldn't begin to be enough.

Just a number I pulled out of my hat.

A rationale (rationalization?), though, would be that since the most recent games were played with the most information, they're the most reliable.

If you want to experiment, the best way to solve this problem in my
opinion is to emphasize the later moves more than the early moves but in a very smooth way. The way I did this was to hit each move I encounter
as well as it's siblings, with a "degrade factor".   I multiply the
scores and the moves by the same constant. Something very close to 1.0
such as 0.99999.

You have to use floating point values to do this.  Consider how this
works - if you play 100 games and win half of them,  with my method it
matters which ones you win. With the normal method it doesn't matter.
With my method you will score much higher if you win more of the games
towards the end. Conversely, if this move is refuted and you lose most
of the later games, your score will drop much more quickly.

That sounds reasonable. Another thought is to keep a score for each node that is incremented when the player wins and decremented when the player loses, but within some bounds (say, +/- 100). The only way to have a high positive score is to have won many recent games.

Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/






Has anyone tried only counting the most recent games through a move
(say, the last 100), rather than the entire history?

Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/



_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to