Петр Смолов wrote:
> I'm still not sure that UCT works good.
>
> For example, I have a position with two possible moves (a1 and b1):
>
>     O
>   /   \
> (a1) (b1)
>
>
> First random game (using a1 as first move) returns 0, second (b1) returns 1:
>
>     O
>   /   \
> (a1) (b1)
> (0/1)(1/1)
>
> Ok, let's try to explore b1 move (the program likes it, because uct of
> a1 = 0):
>
>     O
>   /   \
> (a1) (b1)
> (0/1)(1/1)
>      / | \
>    a2 b2 c2
>
>
> a2 returns 0, b2 and c2 returns 1:
>
>     O
>   /   \
> (a1) (b1)
> (0/1)(3/4)
>      / | \
>    a2 b2 c2
>    0   1  1
>

> So the program will explore b2 and c2 (starting from b1), but maybe a2 is
> the only refuting move. Maybe a1 is better, but the program will continue to
> explore b1, because utc of a1 is 0 and utc of b1 is more than 0.

There are two things that make the program likely to give a1 another
chance:

 - eventually the UTC exploration term will get high enough that a2 is
   considered (as others have explained)

 - instead of treating newly created moves as having 0 wins and 0 games,
   it's a good idea to start them with N wins and 2*N games, so that a
   single early loss doesn't have so much effect.

(or instead of N and 2*N, you could pick numbers whose ratio is a little
over 1/2 to encourage exploration, or whose ratio is close to the
'expected' win rate of the move according to some guess)

-M-
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to