Re: [Computer-go] Computer-go Digest, Vol 12, Issue 89

Aja Sun, 30 Jan 2011 07:14:13 -0800

57.5% is still a 50 Elo improvement, so I'm not unhappy to hear this. Didyou take the reduced playouts per second into account in your experiments?How many games did you play?

I was planning to test fixed time per game after having 100 eloimprovement with fixed 20k playouts per game. Then I was a bit disappointedand stopped the testing at 450 games with 57.5%, same with the result ofLGR-1.

As far as I understand, you tried LGRF-1 conditioned on 3x3 patterns. Whatare your results for plain LGRF-1 without patterns, and did you try LGRF-2at all?

Yes, this testing was LGRF-2 with 3x3 patterns checked for the reply,last move, and second reply. A reply is played only if all the threepatterns match. I haven't tested LGRF-2 without 3x3 patterns, because Ithought patterns should help a lot. But looks like I had too muchcondifence. :(

It is possible that canned responses in LGRF fashion have a certainexpected quality that does not change much with the quality of theunderlying policy. In that case, they could lead to big improvements, tono effect or even to degradation of playing strength depending on howstrong your program already is pre-LGRF. Of course this also depends onhow you prioritize LGRF and how successful you are in replacinglow-quality moves, but not high-quality moves of your default policy withits suggestions. We would need to study this systematically.

My target was set on 100 Elo, or at least 70 Elo. 50 Elo with 15% speedcost is not very satisfying (for me). Agree, we would need to do systematicexperiments to adapt LGRF-2 to softmax. And I believe it is a very goodresearch direction to develop an algorithm to automatically learn thefeature weights combined with LGRF.


Aja


_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Computer-go Digest, Vol 12, Issue 89

Reply via email to