I may have been wrong when I said 0.1 was too high, it's the value that is now testing best and it's the highest value I am testing. It is showing 61 ELO improvement over not using the idea at all. I have only played about 160 games, so there is still a lot of statistical noise here and anything can happen.
When I have checked this out good, I'll experiment with tanh() - Don On Thu, 2008-10-30 at 14:59 -0200, Mark Boon wrote: > Funny, I have been playing with something very similar. Although I > got side-tracked to something else for the moment. Intuitively I felt > tanh() was more appropriate than a linear function. Although you may > want to have the inverse of that, as I was trying to calculate the > territory certainty whereass you want the territory uncertainty. > > Mark > > On 30-okt-08, at 14:21, Don Dailey wrote: > > > Reference bot enhancement > > ========================= > > > > Here is another possible enhancement to the reference bot which I am > > currently testing. I do not yet have anything conclusive enough to > > report, but it looks good so far with a small number of games. > > > > But even if this idea doesn't pan out, it will produce a much more > > natural playing style without weakening the bot. > > > > Here is how it works. We will use 1000 playouts for our example: > > > > 1. Modify the bot to keep a "futures" table. At the end of each > > playout, tally the wins for white and black for each point on the > > board. (I tally -1 for a white win, 1 for a black win to get a > > final score from -1000 to 1000 for each point.) > > > > 2. When the 1000 playouts are complete, compute an "uncertainty value" > > for each point, where 1.0 is completely uncertain, and 0.0 is > > completely certain. A point is completely certain if at the end of > > each playout it was ALWAYS owned by one player or the other. It's > > completely uncertain if it won 50% of the time for either side. > > > > 3. When determining which move to play, apply an uncertainty delta to > > the computed score of each move. This is simply some fraction of > > the "uncertainty value" and the best value I've tested so far is > > 0.025. So you get a bonus that ranges from 0.0 to 0.025. > > > > 4. Choose the move with the best (sc + uncertainty_delta.) > > > > 5. The incentive must be small, large incentives will destroy the > > playing strength. For instance 0.1 is too high and weakens it. > > The value that is testing the best for me (of the ones I've tried > > so far) is 0.025 > > > > 6. This may test at some levels better than others. I'm testing > > at 2000 playouts. > > > > The idea is to gently encourage the bot to avoid playing to points > > that are clearly a forgone conclusion (or conversely, encourage it to > > play where the "action" is.) > > > > This should make the bot play much less artificially. Near the end of > > the game it will prefer moves to unresolved points. Earlier in the > > game it will avoid moving to areas that are "probably" already won or > > lost. > > > > My feeling is that these "incentives" should probably be calculated in > > a non-linear way, but what I described is a good starting point. From > > experiments in the past it seems more important to put the focus and > > most of the weight on avoiding play to highly certain points. So I > > will try some non-linear formula next. > > > > > > - Don > > > > _______________________________________________ > > computer-go mailing list > > [email protected] > > http://www.computer-go.org/mailman/listinfo/computer-go/ > > _______________________________________________ > computer-go mailing list > [email protected] > http://www.computer-go.org/mailman/listinfo/computer-go/
signature.asc
Description: This is a digitally signed message part
_______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
