On Jun 29, 2011, at 5:09 PM, Imran Hendley wrote:

Thanks for the detailed explanation of the paper.

Would it make sense to vary the number of moves generated by the classifier as you run more playouts? Have you tried this? It seems like the classifier would return garbage initially and slowly give better moves deeper down the sequence, analogous to descending the tree in MCTS.

We tried this briefly, setting the "cutoff" (number of moves generated by the classifier) to 1 + (growth * #playouts), where growth is a parameter such as 0.002. This didn't help, but it's conceivable that some other schedule might.

You mentioned that adding more than two previous moves as (linearly independent) input terms does worse. What happens when you start combining moves into a single feature? have you tried just one feature with a 1 at each of the two previous move locations? Or a 1 and a c<1? Or what about using this as a third term, like y[i] = w1[i]*m1 + w2[i]*m2 + w12[i]+m12 + b[i]?

We haven't really tried this; that table would be very large (board area squared), but it could be done.

In the paper you say you only consider local moves, which is natural because your input vectors represent the last two moves, which we already know are very important for predicting local moves.

I can't find the word "local" in the paper. Can you find the statement you're referring to?

What steps can we take to try and learn from other features of the game? One way to add patterns to the classifier might be to have input vectors for 3x3 patterns. Instead of a 1 at the location of all the stones in the 3x3 pattern you could have some small value, and zero elsewhere. So the output for some square would look like y[i] = w1[i]*m1 + w2[i]*m2 + w3[i]*p[i]. Or maybe you don't even need the m1 and m2 terms for non-local moves. You could add other types of features too (atari, capture, extend, etc.) by putting small values in input vectors.

We tried looking at local patterns and at board locations in 3x3 or large-knight's-move neighborhoods. Disappointingly, neither of these things helped.

And this is where offline learning from game records could come in handy, for initializing the p[i]'s, etc.


We tried pre-initializing the weights to bias the system in favor of playing (a) near the center (in 9x9 games) and (b) near the two recent moves. Again, no improvement.

Of course, it's possible that one of these ideas is valid and we just did it wrong. We welcome experiments by others!

Thanks,

Peter Drake
http://www.lclark.edu/~drake/



_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to