Thanks for the detailed explanation of the paper.

Would it make sense to vary the number of moves generated by the classifier
as you run more playouts? Have you tried this? It seems like the classifier
would return garbage initially and slowly give better moves deeper down the
sequence, analogous to descending the tree in MCTS.

You mentioned that adding more than two previous moves as (linearly
independent) input terms does worse. What happens when you start combining
moves into a single feature? have you tried just one feature with a 1 at
each of the two previous move locations? Or a 1 and a c<1? Or what about
using this as a third term, like y[i] = w1[i]*m1 + w2[i]*m2 + w12[i]+m12 +
b[i]?

In the paper you say you only consider local moves, which is natural because
your input vectors represent the last two moves, which we already know are
very important for predicting local moves.

What steps can we take to try and learn from other features of the game? One
way to add patterns to the classifier might be to have input vectors for 3x3
patterns. Instead of a 1 at the location of all the stones in the 3x3
pattern you could have some small value, and zero elsewhere. So the output
for some square would look like y[i] = w1[i]*m1 + w2[i]*m2 + w3[i]*p[i]. Or
maybe you don't even need the m1 and m2 terms for non-local moves. You could
add other types of features too (atari, capture, extend, etc.) by putting
small values in input vectors.

And this is where offline learning from game records could come in handy,
for initializing the p[i]'s, etc.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to