On Sun, Dec 11, 2016 at 8:44 PM, Detlef Schmicker <[email protected]> wrote: > Hi Erik, > > as far as I understood it, it was 250ELO in policy network alone ...
Two problems: (1) it is a self-play result, (2) the policy was tested as a stand-alone player. A policy trained to win games will beat a policy trained to predict moves, so what? That's just confirming the expected result. BTW if you read a bit further it says that the SL policy performed better in AG. This is consistent with earlier reported work. E.g., as a student David used RL to train lots of strong stand-alone policies, but they never worked well when combined with MCTS. As far as I can tell, this one was no different, except that they were able to find some indirect use for it in the form of generating training data for the value network. Erik _______________________________________________ Computer-go mailing list [email protected] http://computer-go.org/mailman/listinfo/computer-go
