Hi Aja,
57.5% is still a 50 Elo improvement, so I'm not unhappy to hear this.
Did you take the reduced playouts per second into account in your
experiments? How many games did you play?
As far as I understand, you tried LGRF-1 conditioned on 3x3 patterns.
What are your results for plain LGRF-1 without patterns, and did you try
LGRF-2 at all?
As for your question, the behavior of playout policies in an MCTS
context is of course always difficult to interpret. In addition, you use
a softmax framework while Orego plays deterministically (with the
exception of a random fallback policy if no capture, escape or matching
pattern is found).
It is possible that canned responses in LGRF fashion have a certain
expected quality that does not change much with the quality of the
underlying policy. In that case, they could lead to big improvements, to
no effect or even to degradation of playing strength depending on how
strong your program already is pre-LGRF. Of course this also depends on
how you prioritize LGRF and how successful you are in replacing
low-quality moves, but not high-quality moves of your default policy
with its suggestions. We would need to study this systematically.
Hendrik
Hi Hendrik,
Expect to see your contributions on adaptive playout in the future.
I have tested 20k playouts for LGRF with 3x3 patterns. It's a bit strange
that the best performance is only around 57.5%, even though I tuned the
probability offset very hard. LGRF with 3x3 patterns slows down Erica almost
15%. Overall, This improvement is much weaker than yours in the paper.
Incrementally updating larger patterns is too costly from my past
experiments, so I don't plan to try it. But I am wondering why Orego can get
so BIG improvement while Erica can't.
Aja
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go