Sure. I'm not sure how best to answer your question, so I'll respond with mostly pseudo code. I can answer questions on theory, but I expect that to go back and forth a bit since it's typically tough to explain.
Where there's grey areas in implementation choice, I have text describing it (including basic theory). For the real source code, the logic is in search.d of HouseBot's (open) source. HouseBot's home page is http://housebot.sourceforge.net Slowdown over basic MC occurs in two ways: 1. Maintaining a list of moves whose order can be shuffled around. 2. Applying the results to all affected leaves in the search tree (3. In my current implementation, I may save too much info and memory management may be another problem) Here's some pseudo code Main thinking loop: while ( can think ){ reset board reset shuffle list shuffleMode = true foreach(legal move){ play(move) if ( shuffleMode ){ shuffleMode = transposition still true //see below if (shuffleMode || first move) insert move into shuffle list } apply result //see below } "Transposition still true" By transposition or shuffle, I mean that I can reorder the moves by each color and arrive at the same board position without violating any rules along the way... Still representing a mostly random game. Obviously, any move that's illegal in the starting board position should will violate this and breaks the move shuffling property. Illegal would be stuff like suicide plays, taking a ko, or playing on top of another stone. If you assume all captures break the move shuffling property, life is simple and strange suicide plays that occur as combinations of stones won't occur. If you're smart enough to find a way to do that efficiently (or ignore it like I did), then you must check to ensure that stones captured are not in the shuffle list. If they are, order matters and the latest move breaks the move shuffling property. My code is far different, but when I think about it, the simplest form would be : "return not (capture on current board)". I should probably try that one out... Allowing captures does have the benefit that the shuffle list gets longer, but the control logic gets more complex (simple ko's, suicide in original position or a complex reordered combination of moves that create suicides, etc...). How to apply the result: Essentially, everything in the shuffle list can consider this as a game that began with that move first. At first, I had fractional game = 1 in a nieve attempt to test out my theory and it failed horribly. The problem is that a positions with long shuffle lists can be reached in many ways by many starting moves. Because of this, sequences with long shuffle lists got counted way too much. I had to make it so that regardless of the shuffle list length, a position would not be over-counted in the final average. To do this, you should compute the probability of the position being arrived at by all candidate methods and then do a weighted sum. prob(this method) / [ prob (this method) + prob (alternate method 1) + prob( alternate method 2) + ...]. I made the naive assumption that prob(method a) == prob(method b). I think that's reasonable for uniform sampling. That value is used in the pseudo code below foreach(random game) float fractional game = 1 / number of moves in shuffle list with the same color to move foreach(move in shuffle list with same color to move){ num sims (move of interest) += fractional game if (random game is is win) num wins (move of interest) += fractional game // Note that variance of the estimate increases by (0.5 * fractional game)^2 } } On 9/14/07, Brian Slesinsky <[EMAIL PROTECTED]> wrote: > > Can you explain a bit more about how 1plyShuffle works? > > On 9/14/07, Jason House <[EMAIL PROTECTED]> wrote: > > Time management is identical. Based on quick profiling, the MCTR > version > > does about 1/6 of the simulations. Actually, since the MCTR does extra > > tracking of info, it can reuse some old simulations, so it may be more > like > > 1/4 or 1/5 of the simulations. It's just using the results more > > efficiently. > > > > > > On 9/14/07, Chris Fant <[EMAIL PROTECTED]> wrote: > > > Does it defeat it based on number of samples taken or time allotted > per > > turn? > > > > > > On 9/14/07, Jason House <[EMAIL PROTECTED]> wrote: > > > > I know I'm only wading in the kiddie pool of computer go with my > 1-ply > > bots, > > > > but I think I may have found a useful enhancement to monte carlo. > > > > > > > > HouseBot supports three 1-ply search modes: > > > > 1plyMC - Uniform sampling > > > > 1plyShuffle - Uniform sampling with monte carlo transposition > reuse > > > > 1plyUCT - Non-uniform sampling based on the UCT algorithm (AKA > UCB) > > > > > > > > Obviously, 1plyMC is far inferior to 1plyUCT as everyone probably > > expects. > > > > What may surprise many is that 1plyShuffle defeats 1plyUCT nearly > every > > > > time. I'm basic this on self-play data from CGOS. Currently, > > > > > > http://cgos.boardspace.net/9x9/cross/housebot-617-UCB.html > > > > shows 10 matches between housebot-617-UCB has played > housebot-618-shuff. > > > > housebot-617-UCB (1plyUCT) lost every time. > > > > > > > > While tricky, it should be possible to combine UCT and MCTR for an > even > > > > stronger bot. MCTR can be thought of as a low bias alternative to > the > > AMAF > > > > heuristic. Rather than using all moves, MCTR takes only the top N > > moves, > > > > where N is computed based on which moves were played in the random > game. > > > > From an open board position MCTR uses about 1/3 of the moves that > AMAF > > > > would. Computation of the resulting winning percentage must also be > > > > weighted based on the probabilities of duplicating results (roughly > > > > speaking, it's 1/N). > > > > > > > > As a result of using MCTR, winning rates are no longer integers as > one > > would > > > > expect. Here's the estimated winning rates for all three algorithms > > when > > > > asked for a white response to black G3: > > > > > > > > 1plyMC: 781 / 1272 > > > > 1plyShuffle: 140.15 / 231.75 > > > > 1plyUCT: 936 / 1515 > > > > > > > > 1plyShuffle is slower because of the extra work information > tracking, > > but > > > > the variance in estimates should be far lower than the numbers would > > > > indicate. I have yet to do the computations, but a sample size of > > 231.75 > > > > has an estimation error of around 6000 normal MC runs for that > position. > > > > That is why my implementation of MCTR is defeating my (1ply) > > implementation > > > > of UCT. > > > > > > > > > _______________________________________________ > > computer-go mailing list > > [email protected] > > http://www.computer-go.org/mailman/listinfo/computer-go/ > > > _______________________________________________ > computer-go mailing list > [email protected] > http://www.computer-go.org/mailman/listinfo/computer-go/ >
_______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
