I thought it might be fun to have some games in early stage of learning from nearly Zero knowledge.
I did not turn off the (relatively weak) playouts and mix them with 30% into the result from the value network. I started at an initial random neural net (small one, about 4ms on GTX970) and use a relatively wide search for MC (much much wider, than I do for good playing strength, unpruning about 5-6 moves) and 100 playouts expanding every 3 playouts, thus 33 network evaluations per move. Additionally I add Gaussian random numbers with a standard derivation of 0.02 to the policy network. With this setup I play 1000 games and do an reinforcement learning cycle with them. One cycle takes me about 5 hours. The first 2 days I did not archive games, than I noticed it might be fun having games from the training history: now I always archive one game per cycle. Here are some games ... http://physik.de/games_during_learning/ I will probably add some more games, if I have them and will try to measure, how strong the bot is with exactly this (weak broad search ) configuration but a pretrained net from 4d+ kgs games on CGOS... Detlef _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go