A few more wordsÂ
*) Pushing this idea to the extreme, one might want to build a "Tree
Network" whose output tries to somehow fit the whole Monte-Carlo Search
Tree (including all the win/lose numbers etc.) for the board position. As
we know a deep network can fit anything. The structure of the
Training a policy network is simple and I have found a Residual Network
with Batch Normalization works very well. However training a value network
is far more challenging as I have found it indeed very easy to have
overfitting, unless one uses the final territory as another prediction
target. Even