I'm trying to come up with a general form (an interface, in Java
terms) for heuristics in UCT. As I see it, there are three phases of
each MC run:
1) When choosing a move from a tree node from which all moves have
been tried, use UCT and no heuristics.
2) When choosing a move from a tree node from which some moves are
unexplored, heuristics may come into play. This case is vastly more
common than case 1; almost all tree nodes are leaves.
3) When choosing a move beyond the tree, heuristics may come into
play. This case is even more common, but heuristics here must be fast
and nondeterministic.
I therefore suggest that a heuristic suggests a move given a board
configuration (including things like the location of the last move)
and potentially a set of allowable moves. It is helpful to maintain a
weighted set of heuristics.
When choosing a move for case 2 above, we consider each move in the
allowed set (that is, the moves which have not already been explored
from this node). In a heuristic set, each move is given a value that
is the weighted sum of the values given by the individual heuristics.
In case 3, we choose one of the heuristics at random, with each
heuristic's weight being its probability of being chosen. The move
suggested by that heuristic, from among all legal moves, is chosen.
Heuristics that could be written in this framework include:
Play a capturing move
Play near the last move
Play the move that, when played at any point, has led to the highest
proportion of wins in past MC runs
Play the move that, when played as a response to the previous move,
has led to the highest proportion of wins in past MC runs
Play the move that matches the most patterns from some pattern database
There are, of course, many more. Is anyone using heuristics that
don't fit into this framework?
Peter Drake
http://www.lclark.edu/~drake/
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/