On Fri, Oct 20, 2017 at 08:02:02PM +0000, Gian-Carlo Pascutto wrote:
> On Fri, Oct 20, 2017, 21:48 Petr Baudis <[email protected]> wrote:
>
> > Few open questions I currently have, comments welcome:
> >
> > - there is no input representing the number of captures; is this
> > information somehow implicit or can the learned winrate predictor
> > never truly approximate the true values because of this?
> >
>
> They are using Chinese rules, so prisoners don't matter. There are simply
> less stones of one color on the board.
Right! No idea what was I thinking.
> > - what ballpark values for c_{puct} are reasonable?
> >
>
> The original paper has the value they used. But this likely needs tuning. I
> would tune with a supervised network to get started, but you need games for
> that. Does it even matter much early on? The network is random :)
The network actually adapts quite rapidly initially, in my experience.
(Doesn't mean it improves - it adapts within local optima of the few
games it played so far.)
> > - why is the dirichlet noise applied only at the root node, if it's
> > useful?
> >
>
> It's only used to get some randomness in the move selection, no ? It's not
> actually useful for anything besides that.
Yes, but why wouldn't you want that randomness in the second or third
move?
> > - the training process is quite lazy - it's not like the network sees
> > each game immediately and adjusts, it looks at last 500k games and
> > samples 1000*2048 positions, meaning about 4 positions per game (if
> > I understood this right) - I wonder what would happen if we trained
> > it more aggressively, and what AlphaGo does during the initial 500k
> > games; currently, I'm training on all positions immediately, I guess
> > I should at least shuffle them ;)
> >
>
> I think the lazyness may be related to the concern that reinforcement
> methods can easily "forget" things they had learned before. The value
> network training also likes positions from distinct games.
That makes sense. I still hope that with a much more aggressive
training schedule we could train a reasonable Go player, perhaps at the
expense of worse scaling at very high elos... (At least I feel
optimistic after discovering a stupid bug in my code.)
--
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely. -- Moist von Lipwig
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go