Hi Aja,

57.5% is still a 50 Elo improvement, so I'm not unhappy to hear this. Did you take the reduced playouts per second into account in your experiments? How many games did you play? As far as I understand, you tried LGRF-1 conditioned on 3x3 patterns. What are your results for plain LGRF-1 without patterns, and did you try LGRF-2 at all? As for your question, the behavior of playout policies in an MCTS context is of course always difficult to interpret. In addition, you use a softmax framework while Orego plays deterministically (with the exception of a random fallback policy if no capture, escape or matching pattern is found). It is possible that canned responses in LGRF fashion have a certain expected quality that does not change much with the quality of the underlying policy. In that case, they could lead to big improvements, to no effect or even to degradation of playing strength depending on how strong your program already is pre-LGRF. Of course this also depends on how you prioritize LGRF and how successful you are in replacing low-quality moves, but not high-quality moves of your default policy with its suggestions. We would need to study this systematically.

Hendrik

Hi Hendrik,

Expect to see your contributions on adaptive playout in the future.

I have tested 20k playouts for LGRF with 3x3 patterns. It's a bit strange
that the best performance is only around 57.5%, even though I tuned the
probability offset very hard. LGRF with 3x3 patterns slows down Erica almost
15%. Overall, This improvement is much weaker than yours in the paper.
Incrementally updating larger patterns is too costly from my past
experiments, so I don't plan to try it. But I am wondering why Orego can get
so BIG improvement while Erica can't.

Aja

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to