On Sun, Dec 11, 2016 at 8:44 PM, Detlef Schmicker <[email protected]> wrote:
> Hi Erik,
>
> as far as I understood it, it was 250ELO in policy network alone ...

Two problems: (1) it is a self-play result, (2) the policy was tested
as a stand-alone player.

A policy trained to win games will beat a policy trained to predict
moves, so what? That's just confirming the expected result.

BTW if you read a bit further it says that the SL policy performed
better in AG. This is consistent with earlier reported work. E.g., as
a student David used RL to train lots of strong stand-alone policies,
but they never worked well when combined with MCTS. As far as I can
tell, this one was no different, except that they were able to find
some indirect use for it in the form of generating training data for
the value network.

Erik
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to