Петр Смолов wrote: > I'm still not sure that UCT works good. > > For example, I have a position with two possible moves (a1 and b1): > > O > / \ > (a1) (b1) > > > First random game (using a1 as first move) returns 0, second (b1) returns 1: > > O > / \ > (a1) (b1) > (0/1)(1/1) > > Ok, let's try to explore b1 move (the program likes it, because uct of > a1 = 0): > > O > / \ > (a1) (b1) > (0/1)(1/1) > / | \ > a2 b2 c2 > > > a2 returns 0, b2 and c2 returns 1: > > O > / \ > (a1) (b1) > (0/1)(3/4) > / | \ > a2 b2 c2 > 0 1 1 >
> So the program will explore b2 and c2 (starting from b1), but maybe a2 is > the only refuting move. Maybe a1 is better, but the program will continue to > explore b1, because utc of a1 is 0 and utc of b1 is more than 0. There are two things that make the program likely to give a1 another chance: - eventually the UTC exploration term will get high enough that a2 is considered (as others have explained) - instead of treating newly created moves as having 0 wins and 0 games, it's a good idea to start them with N wins and 2*N games, so that a single early loss doesn't have so much effect. (or instead of N and 2*N, you could pick numbers whose ratio is a little over 1/2 to encourage exploration, or whose ratio is close to the 'expected' win rate of the move according to some guess) -M- _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
