[Computer-go] Two papers

Rémi Coulom Wed, 11 May 2011 03:28:52 -0700

Hi,

I have just put the paper Aja wrote with me on my web page, about time 
management:
http://remi.coulom.free.fr/Publications/TimeManagement.pdf

Abstract:
Monte-Carlo tree search (MCTS) is a new technique that has produced a huge leap
forward in the strength of Go-playing programs. An interesting aspect of MCTS
that has been rarely studied in the past is the problem of time management.
This paper presents the effect on playing strength of a variety of
time-management heuristics for $19\times19$ Go. Results indicate that clever 
time management can
have a very significant effect on playing strength. Experiments demonstrate that
the most basic algorithm for sudden-death time controls (dividing the remaining
time by a constant) produces a winning rate of 43.2$\pm$2.2\% against GNU Go 
3.8 Level 2,
whereas our most efficient time-allocation strategy can reach a winning rate of
60$\pm$2.2\% without pondering and 67.4$\pm$2.1\% with pondering.

Also, I think we did not yet advertise that paper Lukasz wrote with me last 
year:
http://www.mimuw.edu.pl/~lew/files/Simulation-based%20Search%20of%20Combinatorial%20Games.pdf

Abstract:
Monte-Carlo Tree Search is a very successful game playing algorithm.
Unfortunately it suffers from the horizon effect: some important
tactical sequences may be delayed beyond the depth of the search tree, causing
evaluation errors. Temporal-difference search with a function approximation
is a method that was proposed to overcome these weaknesses, by adaptively
changing the simulation policy outside the tree.

In this paper we present an experimental evidence demonstrating that
a temporal difference search may fail to find an optimal policy, even in
very simple game positions. Classical temporal difference algorithms
try to evaluate a local situation with a numerical value, but, as it
appears, a single number is not enough to model the dynamics of a
partial two-player game state.

As a solution we propose to replace numerical values by approximate
thermographs. With this richer representation of partial states,
reinforcement-learning algorithms converge and accurately represent
dynamics of states, allowing to find an optimal policy.

Maybe if we give him enough encouragement, Lukasz will finish writing his 
thesis one day. I am looking forward to reading it.

Rémi

On 10 mai 2011, at 10:51, Aja wrote:

> Hello Ingo,
> 
> Time management algorithms, like pondering, is one of the major topics in my 
> thesis. I have proved by the expeirments, though with fast time setting, that 
> pondering brings a lot of strength.
> 
> Aja
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

[Computer-go] Two papers

Reply via email to