[Computer-go] Exploration formulas for UCT

David Fotland Sat, 01 Jan 2011 12:19:05 -0800

It would be interesting to see the actual formulas used for choosing the more 
to try in the tree part of the search.


 

For Many Faces, it is:

 

(1 – beta) * (win_rate + 0.45 * sqrt( ln(parent_visits) / child visits)) +

beta * rave_win_rate + mfgo_bias 

 

beta is the old Mogo formula of sqrt(500/(500 + 3 * parent_visits))

 

A child with no visits has a win_rate of 1.1.  Otherwise there is no win_rate 
bias.

 

rave wins and visits are strongly biased when moves are generated using various 
rules and information from the mfgo move generator (in a range of 10% to 90% 
win rate, with hundreds to thousands of visits).

 

mfgo_bias is unchanging, per move, within a range of about +-2%, based on 
mfgo’s move generator’s estimate of the quality of the move.

 

Does anyone else want to share?

 

David

 

From: computer-go-boun...@dvandva.org [mailto:computer-go-boun...@dvandva.org] 
On Behalf Of Fuming Wang
Sent: Saturday, January 01, 2011 9:00 AM
To: Aja; computer-go@dvandva.org
Subject: Re: [Computer-go] Fwd: News on Tromp-Cook ?

 

Hi Aja,



On Sun, Jan 2, 2011 at 12:16 AM, Aja <ajahu...@gmail.com> wrote:

Hi Fuming,

 

Most of the current strong programs are using UCT combined with RAVE (a kind of 
AMAF). The formula is like this (there are many variants),

 

C*RAVE+(1-C)*UCT


This has been my understanding. However, I am surprized to find out that people 
have been setting C close to one, according to Petr and Oliver's postings, 
which is essentially AMAF. MF apparently is doing something different.

Fuming

_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

[Computer-go] Exploration formulas for UCT

Reply via email to