In future papers they should avoid using a strong authority like Fuego for the training and instead force it to learn from a naive uniform random playout policy (with 100x or 1000x more playouts) and then build on that with an iterative approach (as was suggested in the paper).

I also had another thought. Since they are training the policy to maximize the balance and not the winrate, wouldn't you be able to extract more information from each trial by using the score instead of the game result? The normal pitfalls to doing so do not apply here.



Isaac Deutsch wrote:
I admit I had trouble understanding the details of the paper. What I think is the biggest problem for applying this to bigger (up to 19x19) games is that you somehow need access to the "true" value of a move, i.e. it's a win or a loss. On the 5x5 board they used, this might be approximated pretty well, but there's no chance on 19x19 to do so.


Am 13.08.2009 um 05:14 schrieb Michael Williams:

After about the 5th reading, I'm concluding that this is an excellent paper. Is anyone (besides the authors) doing research based on this? There is a lot to do.


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to