Few open questions I currently have, comments welcome:
- there is no input representing the number of captures; is this
information somehow implicit or can the learned winrate predictor
never truly approximate the true values because of this?
- what ballpark values for c_{puct} are reasonable?
- why is the dirichlet noise applied only at the root node, if it's
useful?
- the training process is quite lazy - it's not like the network sees
each game immediately and adjusts, it looks at last 500k games and
samples 1000*2048 positions, meaning about 4 positions per game (if
I understood this right) - I wonder what would happen if we trained
it more aggressively, and what AlphaGo does during the initial 500k
games; currently, I'm training on all positions immediately, I guess
I should at least shuffle them ;)
On Fri, Oct 20, 2017 at 03:23:49PM +0200, Petr Baudis wrote:
> I tried to reimplement the system - in a simplified way, trying to
> find the minimum that learns to play 5x5 in a few thousands of
> self-plays. Turns out there are several components which are important
> to avoid some obvious attractors (like the network predicting black
> loses on every move from its second game on):
>
> - disabling resignation in a portion of games is essential not just
> for tuning resignation threshold (if you want to even do that), but
> just to correct prediction signal by actual scoring rather than
> starting to always resign early in the game
>
> - dirichlet (or other) noise is essential for the network getting
> looped into the same game - which is also self-reinforcing
>
> - i have my doubts about the idea of high temperature move choices
> at the beginning, especially with T=1 ... maybe that's just bad
> very early in the training
>
> On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote:
> > The order of magnitude matches my parameter numbers. (My attempt to
> > reproduce a simplified version of this is currently evolving at
> > https://github.com/pasky/michi/tree/nnet but the code is a mess right
> > now.)
>
> --
> Petr Baudis, Rossum
> Run before you walk! Fly before you crawl! Keep moving forward!
> If we fail, I'd rather fail really hugely. -- Moist von Lipwig
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://computer-go.org/mailman/listinfo/computer-go
--
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely. -- Moist von Lipwig
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go