On Fri, Oct 20, 2017 at 08:02:02PM +0000, Gian-Carlo Pascutto wrote: > On Fri, Oct 20, 2017, 21:48 Petr Baudis <pa...@ucw.cz> wrote: > > > Few open questions I currently have, comments welcome: > > > > - there is no input representing the number of captures; is this > > information somehow implicit or can the learned winrate predictor > > never truly approximate the true values because of this? > > > > They are using Chinese rules, so prisoners don't matter. There are simply > less stones of one color on the board.
Right! No idea what was I thinking. > > - what ballpark values for c_{puct} are reasonable? > > > > The original paper has the value they used. But this likely needs tuning. I > would tune with a supervised network to get started, but you need games for > that. Does it even matter much early on? The network is random :) The network actually adapts quite rapidly initially, in my experience. (Doesn't mean it improves - it adapts within local optima of the few games it played so far.) > > - why is the dirichlet noise applied only at the root node, if it's > > useful? > > > > It's only used to get some randomness in the move selection, no ? It's not > actually useful for anything besides that. Yes, but why wouldn't you want that randomness in the second or third move? > > - the training process is quite lazy - it's not like the network sees > > each game immediately and adjusts, it looks at last 500k games and > > samples 1000*2048 positions, meaning about 4 positions per game (if > > I understood this right) - I wonder what would happen if we trained > > it more aggressively, and what AlphaGo does during the initial 500k > > games; currently, I'm training on all positions immediately, I guess > > I should at least shuffle them ;) > > > > I think the lazyness may be related to the concern that reinforcement > methods can easily "forget" things they had learned before. The value > network training also likes positions from distinct games. That makes sense. I still hope that with a much more aggressive training schedule we could train a reasonable Go player, perhaps at the expense of worse scaling at very high elos... (At least I feel optimistic after discovering a stupid bug in my code.) -- Petr Baudis, Rossum Run before you walk! Fly before you crawl! Keep moving forward! If we fail, I'd rather fail really hugely. -- Moist von Lipwig _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go