On Jun 29, 2011, at 5:09 PM, Imran Hendley wrote:
Thanks for the detailed explanation of the paper.
Would it make sense to vary the number of moves generated by the
classifier as you run more playouts? Have you tried this? It seems
like the classifier would return garbage initially and slowly give
better moves deeper down the sequence, analogous to descending the
tree in MCTS.
We tried this briefly, setting the "cutoff" (number of moves generated
by the classifier) to 1 + (growth * #playouts), where growth is a
parameter such as 0.002. This didn't help, but it's conceivable that
some other schedule might.
You mentioned that adding more than two previous moves as (linearly
independent) input terms does worse. What happens when you start
combining moves into a single feature? have you tried just one
feature with a 1 at each of the two previous move locations? Or a 1
and a c<1? Or what about using this as a third term, like y[i] =
w1[i]*m1 + w2[i]*m2 + w12[i]+m12 + b[i]?
We haven't really tried this; that table would be very large (board
area squared), but it could be done.
In the paper you say you only consider local moves, which is natural
because your input vectors represent the last two moves, which we
already know are very important for predicting local moves.
I can't find the word "local" in the paper. Can you find the statement
you're referring to?
What steps can we take to try and learn from other features of the
game? One way to add patterns to the classifier might be to have
input vectors for 3x3 patterns. Instead of a 1 at the location of
all the stones in the 3x3 pattern you could have some small value,
and zero elsewhere. So the output for some square would look like
y[i] = w1[i]*m1 + w2[i]*m2 + w3[i]*p[i]. Or maybe you don't even
need the m1 and m2 terms for non-local moves. You could add other
types of features too (atari, capture, extend, etc.) by putting
small values in input vectors.
We tried looking at local patterns and at board locations in 3x3 or
large-knight's-move neighborhoods. Disappointingly, neither of these
things helped.
And this is where offline learning from game records could come in
handy, for initializing the p[i]'s, etc.
We tried pre-initializing the weights to bias the system in favor of
playing (a) near the center (in 9x9 games) and (b) near the two recent
moves. Again, no improvement.
Of course, it's possible that one of these ideas is valid and we just
did it wrong. We welcome experiments by others!
Thanks,
Peter Drake
http://www.lclark.edu/~drake/
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go