On Jun 29, 2011, at 10:17 AM, Brian Sheppard wrote:

Why is a classifier better than having a lookup table indexed by OurLastMove, OppLastMove, ProposedNextMove that returns the Wins / Trials experienced when ProposedNextMove is played after the sequence OurLastMove, OppLastMove?

The advantage here is that we combine information from several piles:

- All times this move was played.
- All times this move was played in response to previous move X.
- All times this move was played in response to penultimate move Y.

The scheme you propose only gathers:

- All times this move was played in response to previous move X and penultimate move Y.

This information is more accurate, but accumulates more slowly. (See the Power of Forgetting paper for more discussion on this.)

Are the training cases for your classifier selected from only the UCT nodes, or also from playout nodes?

From the entire playout.

Is the output of your classifier used to initialize the Wins / Trials values for legal moves in new UCT nodes? Is that done by assuming a fixed number of trials (how many?) and setting Wins = ClassifierOutput * Trials?

There is no tree in this system. The primary policy (used for the first 10 moves of each playout) is to choose the (legal) move that the classifier rates highest.

Is that the only use of the classifier in the system?

The above is the only use of the classifier.

Peter Drake
http://www.lclark.edu/~drake/



_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to