I tried to reimplement the system - in a simplified way, trying to
find the minimum that learns to play 5x5 in a few thousands of
self-plays. Turns out there are several components which are important
to avoid some obvious attractors (like the network predicting black
loses on every move from its second game on):
- disabling resignation in a portion of games is essential not just
for tuning resignation threshold (if you want to even do that), but
just to correct prediction signal by actual scoring rather than
starting to always resign early in the game
- dirichlet (or other) noise is essential for the network getting
looped into the same game - which is also self-reinforcing
- i have my doubts about the idea of high temperature move choices
at the beginning, especially with T=1 ... maybe that's just bad
very early in the training
On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote:
> The order of magnitude matches my parameter numbers. (My attempt to
> reproduce a simplified version of this is currently evolving at
> https://github.com/pasky/michi/tree/nnet but the code is a mess right
> now.)
--
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely. -- Moist von Lipwig
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go