Detlef, I misunderstand your last sentence. Do you mean that eventually you'll put a subset of functioning nets on CGOS to measure how quickly their strength is improving?
s. On Nov 6, 2017 4:54 PM, "Detlef Schmicker" <[email protected]> wrote: > I thought it might be fun to have some games in early stage of learning > from nearly Zero knowledge. > > I did not turn off the (relatively weak) playouts and mix them with 30% > into the result from the value network. I started at an initial random > neural net (small one, about 4ms on GTX970) and use a relatively wide > search for MC (much much wider, than I do for good playing strength, > unpruning about 5-6 moves) and 100 playouts expanding every 3 playouts, > thus 33 network evaluations per move. > > Additionally I add Gaussian random numbers with a standard derivation of > 0.02 to the policy network. > > With this setup I play 1000 games and do an reinforcement learning cycle > with them. One cycle takes me about 5 hours. > > The first 2 days I did not archive games, than I noticed it might be fun > having games from the training history: now I always archive one game > per cycle. > > > Here are some games ... > > > http://physik.de/games_during_learning/ > > > I will probably add some more games, if I have them and will try to > measure, how strong the bot is with exactly this (weak broad search ) > configuration but a pretrained net from 4d+ kgs games on CGOS... > > > Detlef > _______________________________________________ > Computer-go mailing list > [email protected] > http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list [email protected] http://computer-go.org/mailman/listinfo/computer-go
