It looks like your example is using average win rate instead of an upper bound. 
Rather than M/N UCT uses M/N + c*log(N1+N2+N3+...)/sqrt(N)

where Ni is the total simulations of the ith move. That extra term means moves 
with initially bad results will be revisited. Stronger go programs add in some 
other terms and set c=0, but that isn't UCT.

Sent from my iPhone

On Oct 26, 2010, at 6:43 AM, Петр Смолов <[email protected]> wrote:

> Hello all!
> I would much appreciate if someone could explain me what exactly "upper 
> confidence bounds applied to trees" is.
> 
> I'm still not sure that UCT works good.
> 
> For example, I have a position with two possible moves (a1 and b1):
> 
>    O
>  /   \
> (a1) (b1)
> 
> 
> First random game (using a1 as first move) returns 0, second (b1) returns 1:
> 
>    O
>  /   \
> (a1) (b1)
> (0/1)(1/1)
> 
> 
> Ok, let's try to explore b1 move (the program likes it, because uct of a1 = 
> 0):
> 
>    O
>  /   \
> (a1) (b1)
> (0/1)(1/1)
>     / | \
>   a2 b2 c2
> 
> 
> a2 returns 0, b2 and c2 returns 1:
> 
>    O
>  /   \
> (a1) (b1)
> (0/1)(3/4)
>     / | \
>   a2 b2 c2
>   0   1  1
> 
> 
> So the program will explore b2 and c2 (starting from b1), but maybe a2 is the 
> only refuting move. Maybe a1 is better, but the program will continue to 
> explore b1, because utc of a1 is 0 and utc of b1 is more than 0.
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to