Hi!
On Wed, Oct 27, 2010 at 07:10:24AM +0400, Петр Смолов wrote:
> 1. How does UCT work for other games? (chess, for example, where the players
> can make a draw)
>
> Why winrate takes into account only wins?
> Winrate := Wins/Visits
>
> What is better:
>
> 3 wins, 0 draws, 7 loses
> or
> 2 wins, 6 draws, 2 loses ?
I cannot answer this personally, I never investigated UCT usage in any
scenario where draws are possible.
> 2. After we play all simulations, which move the program should choose? The
> one with maximim Winrate (Wins/Visits) or the one with maximum Winrate +
> SQRT(ln(...))?
>
> Or (Wins+Draw/2)/Visits ?
The common strategy is to pick the move with largest number of visits
- it is the best explored move which has the surest winrate and
asymptotically, UCT is guaranteed to visit the most the best node. If
you choose based on winrate, you can hit some unlucky "spikes" where a
winrate of a node has shot up quickly for some rather nonsensical move
and would be disproven quickly, if only you did not stop the search just
then.
Not so uncommon way to reconcile this is to check if the most visited
node has also the best winrate - if that's not the case, spend some
extra time searching to try to reconcile the situation and bring the two
values in sync again.
> 3. Why there are so many variations of calculation of UCT value (especially
> what is under a square root)?
>
> Valkyria (22 September 2006) uses this:
> uct := UCTK*Sqrt(ln(n.Visits)/(5*next.Visits))
>
> Orego uses this:
> sqrt(2 * logParentRunCount / node.getRuns(move))
>
> and Fuego uses this:
> m_biasTermConstant * sqrt(logPosCount / (moveCount + 1));
If you look closely, the only difference is in the constant term (what
I labelled the c-term). The exact value of this term needs to be
determined experimentally for each particular program since it quite
depends on the particular mix of heuristics in use.
With uniformly distributed playouts, it would be something around
c=0.2 in sqrt(c*ln(N)/M), with much more sophisticated heuristics and
good prior biasing of the node values and then RAVE, c will approach 0
as the need for UCB-driven exploration will decrease.
--
Petr "Pasky" Baudis
The true meaning of life is to plant a tree under whose shade
you will never sit.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go