Point differential is 8% at the start of a game, and declines to 0% at the
end of a game.

 

The exact fraction is determined by the position in which the trial goes
into the playout. Specifically, it is not the root position, but the UCT
tree leaf.

 

The 8% would apply to a leaf that had 0 stones, and 0% would apply to a leaf
that had 77 stones (81 points minus 2 eyes per player). Since these limits
are not hit at leaf nodes in practice, the actual fractions are more
bounded.

 

I tested starting at 0%, 1%, 2%, 4%, 8%, and 16% and ending at 0%, 1%, 2%,
4%, 8%, and 16% in all possible combinations. My study played several
hundred thousand games of self-play to select candidates, and then measured
6 candidates against GnuGo, MoGo and NeuroGo to determine a winner. All of
these were blitz games that completed in just a few seconds each.

 

Starting point differential at 4% and declining to 0% was almost as good.
Decreasing from 16% to 0% was 3rd best. In Pebbles, the models that kept
point differential at a constant fraction of the evaluation (such as Fuego's
2% or Pachi's 4%) were always worse. Of course, YMMV, so test everything.

 

The optimal parameter combination was better than the default option (which
is 0%,0%) by about 1.2%, IIRC. I am confident that the results are better
under the conditions tested. I am not confident that they carry over to
longer time controls, and there is some evidence (insignificant from a
statistical perspective) from CGOS testing that they do not.

 

Note that the "point differential" that Pebbles is folding into the score is
actually an estimate of winning chance (i.e., a number between 0 and 1) that
is based on point differential. That is, we observe that Black won a game by
5.5 points, and we decide that the winning percentage is 62% or something.
Then we will take the actual result of the trial (1 win to black) and the
estimate (.62 to black) and weight them according to the declining
frequency, like 94% * 1.0 + 6% * 0.62 = 0.9772

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Michael Williams
Sent: Tuesday, July 05, 2011 6:24 PM
To: [email protected]
Subject: Re: [Computer-go] MCTS and Point Differential

 

So what percent seems to work best in Pebbles?

On Mon, Jul 4, 2011 at 11:01 PM, Brian Sheppard <[email protected]> wrote:

Related to the "perfect endgame" thread, but different...

Fuego claims that adding a few percent of point differential to the result
of a trial results in a stronger player. Pachi later confirmed that result,
and I have confirmed it in Pebbles as well.

The standard explanation for this is that the small bias (just 2% in Fuego,
4% in Pachi) help to avoid losses by endgame blunders. Well, that might be,
but I see something more fundamental.

When you score a game in a Win/Loss dimension, there is only one player who
can make an error: the side that is winning. For the loser, all moves are
losing. So a playout stumbles to the right conclusion if it contains an even
number of errors, and if it contains an odd number of errors then it reaches
the wrong conclusion. If you take a probability P of making an error and
model the probability of making an even nubmer of errors then you will find
out that this is a daunting model. You might doubt that MCTS could ever
work.

But MCTS works quite well, and I think that it is because of point
differential.

In a point differential model, *both* players can make errors. So the point
differential takes a random walk from the leaf of the tree to the terminal
position. The trial reaches the right conclusion if the random walk crosses
zero an even number of times.

In a random walk, the probability of crossing zero depends on how far from
zero you start from. So if one tree node is better (that is, higher point
differential) than another, it is more likely that a simulation trial will
result in a win.

To get back to Fuego's finding: why does adding in some point differential
help? Because the larger the point differential of the terminal position,
the higher (on average) was the point differential of the leaf node.

The random walk that takes a leaf node to a terminal node is invertible, so
the same probability distribution relates the leaf and terminal positions.
Accordingly, we can use the terminal point differential to compute a
probability distribution of the leaf node, and that distribution implies a
probability that the leaf node is winning.

So, without doubting the standard theory about how point differential could
reduce yose errors, I see point differential as a factor in opening and
middle play, too.

Brian

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

 

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to