subject:"\[Computer\-go\] Training a \"Score Network\" in Monte\-Carlo Tree Search"

Re: [Computer-go] Training a "Score Network" in Monte-Carlo Tree Search

2017-03-19 Thread Bo Peng

A few more words *) Pushing this idea to the extreme, one might want to build a "Tree Network" whose output tries to somehow fit the whole Monte-Carlo Search Tree (including all the win/lose numbers etc.) for the board position. As we know a deep network can fit anything. The structure of the

[Computer-go] Training a "Score Network" in Monte-Carlo Tree Search

2017-03-19 Thread Bo Peng

Training a policy network is simple and I have found a Residual Network with Batch Normalization works very well. However training a value network is far more challenging as I have found it indeed very easy to have overfitting, unless one uses the final territory as another prediction target. Even