---------- Forwarded message ---------- From: Rémi Coulom <[email protected]> Date: Tue, Jul 25, 2006 at 8:22 AM Subject: [computer-go] Experiments with UCT To: computer-go <[email protected]>
Hi, I mentioned UCT in one of my previous messages to the list: http://zaphod.aml.sztaki.hu/papers/ecml06.pdf I tried it in Crazy Stone. I found that the algorithm described in the paper does not work well, but I managed to improve it a lot with a small change: I used 1/sqrt(20) instead of 1/sqrt(2) for the C_p constant. It now seems to work very well. Here is a summary of how it works: - Use probability of winning as score, not territory - Use the average outcome as position value - Select the move that maximizes v + sqrt((2*log(t))/(10*n)) v is the value of the move (average outcome, between 0 and 1), n the number of simulations of this move, and t the total number of simulations at the current position. In case a move has n = 0, it is selected first. Here are experiment results with Crazy Stone. 170 games are played against GNU Go 3.6 at level 10, from 85 different starting positions, alternating colors, at various time control (time per game), 1 CPU at 2.2 GHz. version 0005 UCT 2 min 40% 46.7% 4 min 48.2% 56.6% 8 min 52.9% 64.7% 16 min 57.4% 67.6% 32 min 66.6% 71.6% I have tried hard to improve it, but it seems very difficult. Using a more clever backup operator may help, but I have not managed to measure a significant difference yet. I thank Yizao for letting me know about UCT. His program, MoGo, seems to be doing very well on CGOS. Maybe Yizao can tell us more about his experiments. Rémi _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
