Re: [Computer-go] AlphaGo Zero

Petr Baudis Fri, 20 Oct 2017 12:57:13 -0700

  Few open questions I currently have, comments welcome:

  - there is no input representing the number of captures; is this
    information somehow implicit or can the learned winrate predictor
    never truly approximate the true values because of this?


  - what ballpark values for c_{puct} are reasonable?

  - why is the dirichlet noise applied only at the root node, if it's
    useful?

  - the training process is quite lazy - it's not like the network sees
    each game immediately and adjusts, it looks at last 500k games and
    samples 1000*2048 positions, meaning about 4 positions per game (if
    I understood this right) - I wonder what would happen if we trained
    it more aggressively, and what AlphaGo does during the initial 500k
    games; currently, I'm training on all positions immediately, I guess
    I should at least shuffle them ;)

On Fri, Oct 20, 2017 at 03:23:49PM +0200, Petr Baudis wrote:
>   I tried to reimplement the system - in a simplified way, trying to
> find the minimum that learns to play 5x5 in a few thousands of
> self-plays.  Turns out there are several components which are important
> to avoid some obvious attractors (like the network predicting black
> loses on every move from its second game on):
> 
>   - disabling resignation in a portion of games is essential not just
>     for tuning resignation threshold (if you want to even do that), but
>     just to correct prediction signal by actual scoring rather than
>     starting to always resign early in the game
> 
>   - dirichlet (or other) noise is essential for the network getting
>     looped into the same game - which is also self-reinforcing
> 
>   - i have my doubts about the idea of high temperature move choices
>     at the beginning, especially with T=1 ... maybe that's just bad
>     very early in the training
> 
> On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote:
> >   The order of magnitude matches my parameter numbers.  (My attempt to
> > reproduce a simplified version of this is currently evolving at
> > https://github.com/pasky/michi/tree/nnet but the code is a mess right
> > now.)
> 
> -- 
>                                       Petr Baudis, Rossum
>       Run before you walk! Fly before you crawl! Keep moving forward!
>       If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://computer-go.org/mailman/listinfo/computer-go

-- 
                                        Petr Baudis, Rossum
        Run before you walk! Fly before you crawl! Keep moving forward!
        If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

Reply via email to