Re: [computer-go] monte carlo transposition reuse (MCTR)

Jason House Fri, 14 Sep 2007 11:01:08 -0700

Sure.

I'm not sure how best to answer your question, so I'll respond with mostly
pseudo code. I can answer questions on theory, but I expect that to go back
and forth a bit since it's typically tough to explain.

Where there's grey areas in implementation choice, I have text describing it
(including basic theory).  For the real source code, the logic is in
search.d of HouseBot's (open) source. HouseBot's home page is
http://housebot.sourceforge.net

Slowdown over basic MC occurs in two ways:
1. Maintaining a list of moves whose order can be shuffled around.
2. Applying the results to all affected leaves in the search tree
(3. In my current implementation, I may save too much info and memory
management may be another problem)

Here's some pseudo code
Main thinking loop:
  while ( can think ){
    reset board
    reset shuffle list
    shuffleMode = true
    foreach(legal move){
      play(move)
      if ( shuffleMode ){
        shuffleMode = transposition still true  //see below
        if (shuffleMode || first move)
          insert move into shuffle list
    }
    apply result   //see below
  }

"Transposition still true"
  By transposition or shuffle, I mean that I can reorder the moves by each
color and arrive at the same board position without violating any rules
along the way... Still representing a mostly random game.
  Obviously, any move that's illegal in the starting board position should
will violate this and breaks the move shuffling property.  Illegal would be
stuff like suicide plays, taking a ko, or playing on top of another stone.
If you assume all captures break the move shuffling property, life is simple
and strange suicide plays that occur as combinations of stones won't occur.
If you're smart enough to find a way to do that efficiently (or ignore it
like I did), then you must check to ensure that stones captured are not in
the shuffle list.  If they are, order matters and the latest move breaks the
move shuffling property.
  My code is far different, but when I think about it, the simplest form
would be : "return not (capture on current board)".  I should probably try
that one out...  Allowing captures does have the benefit that the shuffle
list gets longer, but the control logic gets more complex (simple ko's,
suicide in original position or a complex reordered combination of moves
that create suicides, etc...).

How to apply the result:
  Essentially, everything in the shuffle list can consider this as a game
that began with that move first.  At first, I had fractional game = 1 in a
nieve attempt to test out my theory and it failed horribly.  The problem is
that a positions with long shuffle lists can be reached in many ways by many
starting moves.  Because of this, sequences with long shuffle lists got
counted way too much.  I had to make it so that regardless of the shuffle
list length, a position would not be over-counted in the final average.  To
do this, you should compute the probability of the position being arrived at
by all candidate methods and then do a weighted sum.  prob(this method) / [
prob (this method) + prob (alternate method 1) + prob( alternate method 2) +
...].  I made the naive assumption that prob(method a) == prob(method b).  I
think that's reasonable for uniform sampling.  That value is used in the
pseudo code below

  foreach(random game)
    float fractional game = 1 / number of moves in shuffle list with the
same color to move
    foreach(move in shuffle list with same color to move){
      num sims (move of interest) += fractional game
      if (random game is is win)
        num wins (move of interest) += fractional game
      // Note that variance of the estimate increases by (0.5 * fractional
game)^2
    }
  }

On 9/14/07, Brian Slesinsky <[EMAIL PROTECTED]> wrote:
>
> Can you explain a bit more about how 1plyShuffle works?
>
> On 9/14/07, Jason House <[EMAIL PROTECTED]> wrote:
> > Time management is identical.  Based on quick profiling, the MCTR
> version
> > does about 1/6 of the simulations.  Actually, since the MCTR does extra
> > tracking of info, it can reuse some old simulations, so it may be more
> like
> > 1/4 or 1/5 of the simulations.  It's just using the results more
> > efficiently.
> >
> >
> > On 9/14/07, Chris Fant <[EMAIL PROTECTED]> wrote:
> > > Does it defeat it based on number of samples taken or time allotted
> per
> > turn?
> > >
> > > On 9/14/07, Jason House <[EMAIL PROTECTED]> wrote:
> > > > I know I'm only wading in the kiddie pool of computer go with my
> 1-ply
> > bots,
> > > > but I think I may have found a useful enhancement to monte carlo.
> > > >
> > > > HouseBot supports three 1-ply search modes:
> > > >   1plyMC - Uniform sampling
> > > >   1plyShuffle - Uniform sampling with monte carlo transposition
> reuse
> > > >   1plyUCT - Non-uniform sampling based on the UCT algorithm (AKA
> UCB)
> > > >
> > > > Obviously, 1plyMC is far inferior to 1plyUCT as everyone probably
> > expects.
> > > > What may surprise many is that 1plyShuffle defeats 1plyUCT nearly
> every
> > > > time.  I'm basic this on self-play data from CGOS.  Currently,
> > > >
> > http://cgos.boardspace.net/9x9/cross/housebot-617-UCB.html
> > > > shows 10 matches between housebot-617-UCB has played
> housebot-618-shuff.
> > > > housebot-617-UCB (1plyUCT) lost every time.
> > > >
> > > > While tricky, it should be possible to combine UCT and MCTR for an
> even
> > > > stronger bot.  MCTR can be thought of as a low bias alternative to
> the
> > AMAF
> > > > heuristic.  Rather than using all moves, MCTR takes only the top N
> > moves,
> > > > where N is computed based on which moves were played in the random
> game.
> > > > From an open board position MCTR uses about 1/3 of the moves that
> AMAF
> > > > would.  Computation of the resulting winning percentage must also be
> > > > weighted based on the probabilities of duplicating results (roughly
> > > > speaking, it's 1/N).
> > > >
> > > > As a result of using MCTR, winning rates are no longer integers as
> one
> > would
> > > > expect.  Here's the estimated winning rates for all three algorithms
> > when
> > > > asked for a white response to black G3:
> > > >
> > > > 1plyMC:       781     / 1272
> > > > 1plyShuffle:  140.15 /  231.75
> > > > 1plyUCT:     936     / 1515
> > > >
> > > > 1plyShuffle is slower because of the extra work information
> tracking,
> > but
> > > > the variance in estimates should be far lower than the numbers would
> > > > indicate.  I have yet to do the computations, but a sample size of
> > 231.75
> > > > has an estimation error of around 6000 normal MC runs for that
> position.
> > > > That is why my implementation of MCTR is defeating my (1ply)
> > implementation
> > > > of UCT.
> > >
> >
> >
> > _______________________________________________
> > computer-go mailing list
> > [email protected]
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/
>

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] monte carlo transposition reuse (MCTR)

Reply via email to