I know I'm only wading in the kiddie pool of computer go with my 1-ply bots, but I think I may have found a useful enhancement to monte carlo.
HouseBot supports three 1-ply search modes: 1plyMC - Uniform sampling 1plyShuffle - Uniform sampling with monte carlo transposition reuse 1plyUCT - Non-uniform sampling based on the UCT algorithm (AKA UCB) Obviously, 1plyMC is far inferior to 1plyUCT as everyone probably expects. What may surprise many is that 1plyShuffle defeats 1plyUCT nearly every time. I'm basic this on self-play data from CGOS. Currently, http://cgos.boardspace.net/9x9/cross/housebot-617-UCB.html shows 10 matches between housebot-617-UCB has played housebot-618-shuff. housebot-617-UCB (1plyUCT) lost every time. While tricky, it should be possible to combine UCT and MCTR for an even stronger bot. MCTR can be thought of as a low bias alternative to the AMAF heuristic. Rather than using all moves, MCTR takes only the top N moves, where N is computed based on which moves were played in the random game. >From an open board position MCTR uses about 1/3 of the moves that AMAF would. Computation of the resulting winning percentage must also be weighted based on the probabilities of duplicating results (roughly speaking, it's 1/N). As a result of using MCTR, winning rates are no longer integers as one would expect. Here's the estimated winning rates for all three algorithms when asked for a white response to black G3: 1plyMC: 781 / 1272 1plyShuffle: 140.15 / 231.75 1plyUCT: 936 / 1515 1plyShuffle is slower because of the extra work information tracking, but the variance in estimates should be far lower than the numbers would indicate. I have yet to do the computations, but a sample size of 231.75has an estimation error of around 6000 normal MC runs for that position. That is why my implementation of MCTR is defeating my (1ply) implementation of UCT.
_______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
