Hi, I have just put the paper Aja wrote with me on my web page, about time management: http://remi.coulom.free.fr/Publications/TimeManagement.pdf
Abstract: Monte-Carlo tree search (MCTS) is a new technique that has produced a huge leap forward in the strength of Go-playing programs. An interesting aspect of MCTS that has been rarely studied in the past is the problem of time management. This paper presents the effect on playing strength of a variety of time-management heuristics for $19\times19$ Go. Results indicate that clever time management can have a very significant effect on playing strength. Experiments demonstrate that the most basic algorithm for sudden-death time controls (dividing the remaining time by a constant) produces a winning rate of 43.2$\pm$2.2\% against GNU Go 3.8 Level 2, whereas our most efficient time-allocation strategy can reach a winning rate of 60$\pm$2.2\% without pondering and 67.4$\pm$2.1\% with pondering. Also, I think we did not yet advertise that paper Lukasz wrote with me last year: http://www.mimuw.edu.pl/~lew/files/Simulation-based%20Search%20of%20Combinatorial%20Games.pdf Abstract: Monte-Carlo Tree Search is a very successful game playing algorithm. Unfortunately it suffers from the horizon effect: some important tactical sequences may be delayed beyond the depth of the search tree, causing evaluation errors. Temporal-difference search with a function approximation is a method that was proposed to overcome these weaknesses, by adaptively changing the simulation policy outside the tree. In this paper we present an experimental evidence demonstrating that a temporal difference search may fail to find an optimal policy, even in very simple game positions. Classical temporal difference algorithms try to evaluate a local situation with a numerical value, but, as it appears, a single number is not enough to model the dynamics of a partial two-player game state. As a solution we propose to replace numerical values by approximate thermographs. With this richer representation of partial states, reinforcement-learning algorithms converge and accurately represent dynamics of states, allowing to find an optimal policy. Maybe if we give him enough encouragement, Lukasz will finish writing his thesis one day. I am looking forward to reading it. Rémi On 10 mai 2011, at 10:51, Aja wrote: > Hello Ingo, > > Time management algorithms, like pondering, is one of the major topics in my > thesis. I have proved by the expeirments, though with fast time setting, that > pondering brings a lot of strength. > > Aja _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
