---------- Forwarded message ----------
From: Rémi Coulom <[email protected]>
Date: Tue, Jul 25, 2006 at 8:22 AM
Subject: [computer-go] Experiments with UCT
To: computer-go <[email protected]>


Hi,

I mentioned UCT in one of my previous messages to the list:
http://zaphod.aml.sztaki.hu/papers/ecml06.pdf

I tried it in Crazy Stone. I found that the algorithm described in the paper
does not work well, but I managed to improve it a lot with a small change: I
used 1/sqrt(20) instead of 1/sqrt(2) for the C_p constant. It now seems to
work very well.

Here is a summary of how it works:
 - Use probability of winning as score, not territory
 - Use the average outcome as position value
 - Select the move that maximizes v + sqrt((2*log(t))/(10*n))

v is the value of the move (average outcome, between 0 and 1), n the number
of simulations of this move, and t the total number of simulations at the
current position. In case a move has n = 0, it is selected first.

Here are experiment results with Crazy Stone. 170 games are played against
GNU Go 3.6 at level 10, from 85 different starting positions, alternating
colors, at various time control (time per game), 1 CPU at 2.2 GHz.

       version 0005  UCT
 2 min  40%           46.7%
 4 min  48.2%         56.6%
 8 min  52.9%         64.7%
16 min  57.4%         67.6%
32 min  66.6%         71.6%

I have tried hard to improve it, but it seems very difficult. Using a more
clever backup operator may help, but I have not managed to measure a
significant difference yet.

I thank Yizao for letting me know about UCT. His program, MoGo, seems to be
doing very well on CGOS. Maybe Yizao can tell us more about his experiments.

Rémi

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to