I'm trying to come up with a general form (an interface, in Java terms) for heuristics in UCT. As I see it, there are three phases of each MC run:

1) When choosing a move from a tree node from which all moves have been tried, use UCT and no heuristics.

2) When choosing a move from a tree node from which some moves are unexplored, heuristics may come into play. This case is vastly more common than case 1; almost all tree nodes are leaves.

3) When choosing a move beyond the tree, heuristics may come into play. This case is even more common, but heuristics here must be fast and nondeterministic.

I therefore suggest that a heuristic suggests a move given a board configuration (including things like the location of the last move) and potentially a set of allowable moves. It is helpful to maintain a weighted set of heuristics.

When choosing a move for case 2 above, we consider each move in the allowed set (that is, the moves which have not already been explored from this node). In a heuristic set, each move is given a value that is the weighted sum of the values given by the individual heuristics.

In case 3, we choose one of the heuristics at random, with each heuristic's weight being its probability of being chosen. The move suggested by that heuristic, from among all legal moves, is chosen.

Heuristics that could be written in this framework include:

Play a capturing move
Play near the last move
Play the move that, when played at any point, has led to the highest proportion of wins in past MC runs Play the move that, when played as a response to the previous move, has led to the highest proportion of wins in past MC runs
Play the move that matches the most patterns from some pattern database

There are, of course, many more. Is anyone using heuristics that don't fit into this framework?


Peter Drake
http://www.lclark.edu/~drake/



_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to