Re: [Computer-go] Exploration formulas for UCT

Petr Baudis Sat, 01 Jan 2011 18:57:47 -0800

  Hi!

On Sat, Jan 01, 2011 at 12:18:46PM -0800, David Fotland wrote:
> For Many Faces, it is:
> 
>  
> 
> (1 – beta) * (win_rate + 0.45 * sqrt( ln(parent_visits) / child visits)) + 
> beta * rave_win_rate + mfgo_bias


  Pachi:

        (1 - beta) * (win_rate) + beta * (rave_win_rate)

  "Even game prior" is essential - prioring win_rate with 0.5 at n
playouts, where n can be between 7 and 40.

  On 9x9, 0.02 * sqrt(...) can be beneficial (in the order of few ELO,
I think) in some setups, but it seems its effect can be equated by
twiddling other values.

> beta is the old Mogo formula of sqrt(500/(500 + 3 * parent_visits))

  We use the Silver formula:

        rave_visits / (rave_visits + real_visits + rave_visits * real_visits * 
3000)

The figure of 3000 is surprisingly resilient. Even with radically
different heuristics and playouts, it stays the empirical optimum.

> A child with no visits has a win_rate of 1.1.  Otherwise there is no win_rate 
> bias.
> 
> rave wins and visits are strongly biased when moves are generated using 
> various rules and information from the mfgo move generator (in a range of 10% 
> to 90% win rate, with hundreds to thousands of visits).
> 
> mfgo_bias is unchanging, per move, within a range of about +-2%, based on 
> mfgo’s move generator’s estimate of the quality of the move.

  We call our bias a "prior" and simply seed either the win_rate
or rave_win_rate with it upon node expansion - it makes almost
no difference if we use normal or rave win_rate, surprisingly.

  Of course we have many sources of priors. Each prior contributes
certain winrate value (but usually either 0.0 or 1.0) weighted with
certain number of visits - again between 7 and 40.

  Aside of the even game prior (which is essential for the search simply
to work at all in our setup - and it is also another way to compensate
for the missing exploration term), the most important prior is the
playout policy hinter, using the same heuristics (and code) as the
playout policy to pick good tree moves. On the 19x19, line height prior
(1st line 0.0, 3rd line 1.0) and especially the CFG-to-last-move prior
are essential.

-- 
                                Petr "Pasky" Baudis
Computer science education cannot make an expert programmer any more
than studying brushes and pigment can make an expert painter. --esr
_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Exploration formulas for UCT

Reply via email to