[computer-go] Proposed UCT / transposition table implementation

Peter Drake Mon, 04 Dec 2006 10:00:22 -0800

I've noticed that Orego has done very poorly in the last couple ofcompetitions. This is partly due to the improvements in others'programs, but I think the fact that Orego currently doesn't have atransposition table is crippling. Since I'm rewriting this stuff inJava, I'm thinking about how to handle this, and would like to offerup the following plan for all of you to poke holes in. I'm not sureif I'm reinventing the CrazyStone structure here, but I think thisone is slightly different.


DATA STRUCTURE

The "tree" is a enormous, preallocated array of Node objects.(Because this array is also being used as a hash table, the size ofthe table should be a prime number.) This is really a directedacyclic graph (DAG) rather than a tree, because a Node may be a childof more than one parent.


Each Node contains the following information:

A+1 "pointers" (in some sense) to child Nodes, where A is the area ofthe board. (The extra pointer is for the pass move.) Some of thesepointers (initially, all of them) may be the special value UNEXPLORED.


The number of runs through this Node so far.

The number black wins through this Node so far.

A Zobrist hash of the position represented by this Node. This mustincorporate, in addition to the locations of stones on the board, theturn number (e.g., zero for the beginning of the game) and the numberof consecutive passes.


The turn number, stored explicitly.

A forced leaf turn number, to be explained below.

(Note that, in contrast to the current Orego structure, we won't needtails or a free list for this one.)


GENERATING A MONTE CARLO RUN

I want to separate the process of generating a Monte Carlo run fromthe process of modifying the tree for two reasons. First, this willmake multithreading easier. Second, I will be able to incorporaterecorded games into the tree by pretending that the program hadplayed them.

To generate a Monte Carlo run, I start at the root Node. Each move ischosen using UCT, based on UCB1-TUNED as described on p. 5 of therecent Gelly et al tech report on MoGo. As in the current Orego, thisis done with a double-hashing scheme and in a way that always choosesan untried move if one exists.

There is one complication here. The number of runs through thechildren might exceed the number of runs through the parent becauseof transpositions. In Gelly's notation, this means Tj(n) may exceedn. I THINK I can ignore this, as it will just give "oversampled"moves unusually narrow confidence windows.

If we are not at a Node (because we're in an unexplored region of thesearch space), if the current Node has a forced leaf turn numbergreater than or equal to the root turn number, or if any child isfound that is not at the correct turn number (because of a hashcollision), the move is chosen randomly.


INCORPORATING A GAME INTO THE TREE

Again, we start at the root Node. We work down the tree, updating therun and black win count of each node encountered. If the childpointer we would be following is UNEXPLORED, we use theaforementioned Zobrist hash (modulo the table size) to find a Node touse as the child. This may be a Node that has already beeninitialized, a "fresh" Node (i.e., one that we can overwrite), or aNode that we can't overwrite.

If the child has the correct turn number, either we have been downthis path before or we have found a transposition. In either case,continue down the tree.

If the child has a turn number that is lower than that of the root,it is an old node and can be overwritten. We reinitialize this onenode, update its run and black win counts, and stop. Thus, every runadds at most one node to the tree.

If the offending child is not so old that it can be overwritten, wemust leave it alone lest it mess up another part of the tree. At thispoint, the parent's forced leaf turn number is set to the child'sturn number; all moves from that parent will be random until a laterpoint in the game when it is safe to overwrite the offending child.



I welcome your input,

Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Proposed UCT / transposition table implementation

Reply via email to