> It will not continue indefinitely, since as number of trials (in root,
> then in b1) goes up, the number of tests (for a1 and a2) stays fixed and
> therefore the upper confidence bound of a1 and a2 is increasing -
> basically, with ratio M/N going down, we accept that given move might in
> the best case have higher and higher winning probability.


Thank you very much for your answer.

More questions, if I may.


1. How does UCT work for other games? (chess, for example, where the players 
can make a draw)

Why winrate takes into account only wins?
Winrate := Wins/Visits

What is better:

3 wins, 0 draws, 7 loses
or
2 wins, 6 draws, 2 loses ?



2. After we play all simulations, which move the program should choose? The one 
with maximim Winrate (Wins/Visits) or the one with maximum Winrate + 
SQRT(ln(...))?

Or (Wins+Draw/2)/Visits ?



3. Why there are so many variations of calculation of UCT value (especially 
what is under a square root)?

Valkyria (22 September 2006) uses this:
uct := UCTK*Sqrt(ln(n.Visits)/(5*next.Visits))

Orego uses this:
sqrt(2 * logParentRunCount / node.getRuns(move))

and Fuego uses this:
m_biasTermConstant * sqrt(logPosCount / (moveCount + 1));

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to