On Jun 29, 2011, at 10:17 AM, Brian Sheppard wrote:
Why is a classifier better than having a lookup table indexed by
OurLastMove, OppLastMove, ProposedNextMove that returns the Wins /
Trials experienced when ProposedNextMove is played after the
sequence OurLastMove, OppLastMove?
The advantage here is that we combine information from several piles:
- All times this move was played.
- All times this move was played in response to previous move X.
- All times this move was played in response to penultimate move Y.
The scheme you propose only gathers:
- All times this move was played in response to previous move X and
penultimate move Y.
This information is more accurate, but accumulates more slowly. (See
the Power of Forgetting paper for more discussion on this.)
Are the training cases for your classifier selected from only the
UCT nodes, or also from playout nodes?
From the entire playout.
Is the output of your classifier used to initialize the Wins /
Trials values for legal moves in new UCT nodes? Is that done by
assuming a fixed number of trials (how many?) and setting Wins =
ClassifierOutput * Trials?
There is no tree in this system. The primary policy (used for the
first 10 moves of each playout) is to choose the (legal) move that the
classifier rates highest.
Is that the only use of the classifier in the system?
The above is the only use of the classifier.
Peter Drake
http://www.lclark.edu/~drake/
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go