Point differential is 8% at the start of a game, and declines to 0% at the end of a game.
The exact fraction is determined by the position in which the trial goes into the playout. Specifically, it is not the root position, but the UCT tree leaf. The 8% would apply to a leaf that had 0 stones, and 0% would apply to a leaf that had 77 stones (81 points minus 2 eyes per player). Since these limits are not hit at leaf nodes in practice, the actual fractions are more bounded. I tested starting at 0%, 1%, 2%, 4%, 8%, and 16% and ending at 0%, 1%, 2%, 4%, 8%, and 16% in all possible combinations. My study played several hundred thousand games of self-play to select candidates, and then measured 6 candidates against GnuGo, MoGo and NeuroGo to determine a winner. All of these were blitz games that completed in just a few seconds each. Starting point differential at 4% and declining to 0% was almost as good. Decreasing from 16% to 0% was 3rd best. In Pebbles, the models that kept point differential at a constant fraction of the evaluation (such as Fuego's 2% or Pachi's 4%) were always worse. Of course, YMMV, so test everything. The optimal parameter combination was better than the default option (which is 0%,0%) by about 1.2%, IIRC. I am confident that the results are better under the conditions tested. I am not confident that they carry over to longer time controls, and there is some evidence (insignificant from a statistical perspective) from CGOS testing that they do not. Note that the "point differential" that Pebbles is folding into the score is actually an estimate of winning chance (i.e., a number between 0 and 1) that is based on point differential. That is, we observe that Black won a game by 5.5 points, and we decide that the winning percentage is 62% or something. Then we will take the actual result of the trial (1 win to black) and the estimate (.62 to black) and weight them according to the declining frequency, like 94% * 1.0 + 6% * 0.62 = 0.9772 From: [email protected] [mailto:[email protected]] On Behalf Of Michael Williams Sent: Tuesday, July 05, 2011 6:24 PM To: [email protected] Subject: Re: [Computer-go] MCTS and Point Differential So what percent seems to work best in Pebbles? On Mon, Jul 4, 2011 at 11:01 PM, Brian Sheppard <[email protected]> wrote: Related to the "perfect endgame" thread, but different... Fuego claims that adding a few percent of point differential to the result of a trial results in a stronger player. Pachi later confirmed that result, and I have confirmed it in Pebbles as well. The standard explanation for this is that the small bias (just 2% in Fuego, 4% in Pachi) help to avoid losses by endgame blunders. Well, that might be, but I see something more fundamental. When you score a game in a Win/Loss dimension, there is only one player who can make an error: the side that is winning. For the loser, all moves are losing. So a playout stumbles to the right conclusion if it contains an even number of errors, and if it contains an odd number of errors then it reaches the wrong conclusion. If you take a probability P of making an error and model the probability of making an even nubmer of errors then you will find out that this is a daunting model. You might doubt that MCTS could ever work. But MCTS works quite well, and I think that it is because of point differential. In a point differential model, *both* players can make errors. So the point differential takes a random walk from the leaf of the tree to the terminal position. The trial reaches the right conclusion if the random walk crosses zero an even number of times. In a random walk, the probability of crossing zero depends on how far from zero you start from. So if one tree node is better (that is, higher point differential) than another, it is more likely that a simulation trial will result in a win. To get back to Fuego's finding: why does adding in some point differential help? Because the larger the point differential of the terminal position, the higher (on average) was the point differential of the leaf node. The random walk that takes a leaf node to a terminal node is invertible, so the same probability distribution relates the leaf and terminal positions. Accordingly, we can use the terminal point differential to compute a probability distribution of the leaf node, and that distribution implies a probability that the leaf node is winning. So, without doubting the standard theory about how point differential could reduce yose errors, I see point differential as a factor in opening and middle play, too. Brian _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
