I tried to reimplement the system - in a simplified way, trying to
find the minimum that learns to play 5x5 in a few thousands of
self-plays.  Turns out there are several components which are important
to avoid some obvious attractors (like the network predicting black
loses on every move from its second game on):

  - disabling resignation in a portion of games is essential not just
    for tuning resignation threshold (if you want to even do that), but
    just to correct prediction signal by actual scoring rather than
    starting to always resign early in the game

  - dirichlet (or other) noise is essential for the network getting
    looped into the same game - which is also self-reinforcing

  - i have my doubts about the idea of high temperature move choices
    at the beginning, especially with T=1 ... maybe that's just bad
    very early in the training

On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote:
>   The order of magnitude matches my parameter numbers.  (My attempt to
> reproduce a simplified version of this is currently evolving at
> https://github.com/pasky/michi/tree/nnet but the code is a mess right
> now.)

-- 
                                        Petr Baudis, Rossum
        Run before you walk! Fly before you crawl! Keep moving forward!
        If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to