57.5% is still a 50 Elo improvement, so I'm not unhappy to hear this. Did you take the reduced playouts per second into account in your experiments? How many games did you play?

I was planning to test fixed time per game after having 100 elo improvement with fixed 20k playouts per game. Then I was a bit disappointed and stopped the testing at 450 games with 57.5%, same with the result of LGR-1.

As far as I understand, you tried LGRF-1 conditioned on 3x3 patterns. What are your results for plain LGRF-1 without patterns, and did you try LGRF-2 at all?

Yes, this testing was LGRF-2 with 3x3 patterns checked for the reply, last move, and second reply. A reply is played only if all the three patterns match. I haven't tested LGRF-2 without 3x3 patterns, because I thought patterns should help a lot. But looks like I had too much condifence. :(

It is possible that canned responses in LGRF fashion have a certain expected quality that does not change much with the quality of the underlying policy. In that case, they could lead to big improvements, to no effect or even to degradation of playing strength depending on how strong your program already is pre-LGRF. Of course this also depends on how you prioritize LGRF and how successful you are in replacing low-quality moves, but not high-quality moves of your default policy with its suggestions. We would need to study this systematically.

My target was set on 100 Elo, or at least 70 Elo. 50 Elo with 15% speed cost is not very satisfying (for me). Agree, we would need to do systematic experiments to adapt LGRF-2 to softmax. And I believe it is a very good research direction to develop an algorithm to automatically learn the feature weights combined with LGRF.

Aja


_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to