Few open questions I currently have, comments welcome: - there is no input representing the number of captures; is this information somehow implicit or can the learned winrate predictor never truly approximate the true values because of this?
- what ballpark values for c_{puct} are reasonable? - why is the dirichlet noise applied only at the root node, if it's useful? - the training process is quite lazy - it's not like the network sees each game immediately and adjusts, it looks at last 500k games and samples 1000*2048 positions, meaning about 4 positions per game (if I understood this right) - I wonder what would happen if we trained it more aggressively, and what AlphaGo does during the initial 500k games; currently, I'm training on all positions immediately, I guess I should at least shuffle them ;) On Fri, Oct 20, 2017 at 03:23:49PM +0200, Petr Baudis wrote: > I tried to reimplement the system - in a simplified way, trying to > find the minimum that learns to play 5x5 in a few thousands of > self-plays. Turns out there are several components which are important > to avoid some obvious attractors (like the network predicting black > loses on every move from its second game on): > > - disabling resignation in a portion of games is essential not just > for tuning resignation threshold (if you want to even do that), but > just to correct prediction signal by actual scoring rather than > starting to always resign early in the game > > - dirichlet (or other) noise is essential for the network getting > looped into the same game - which is also self-reinforcing > > - i have my doubts about the idea of high temperature move choices > at the beginning, especially with T=1 ... maybe that's just bad > very early in the training > > On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote: > > The order of magnitude matches my parameter numbers. (My attempt to > > reproduce a simplified version of this is currently evolving at > > https://github.com/pasky/michi/tree/nnet but the code is a mess right > > now.) > > -- > Petr Baudis, Rossum > Run before you walk! Fly before you crawl! Keep moving forward! > If we fail, I'd rather fail really hugely. -- Moist von Lipwig > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go -- Petr Baudis, Rossum Run before you walk! Fly before you crawl! Keep moving forward! If we fail, I'd rather fail really hugely. -- Moist von Lipwig _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go