Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

Xavier Combelle Fri, 10 Nov 2017 04:36:04 -0800

You make me really curious,
what is a Keras model ?

Le 10/11/2017 à 01:47, Petr Baudis a écrit :
>   Hi,
>
>   I got first *somewhat* positive results in my attempt to reproduce
> AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
> - 7x7 board. :)  a.k.a.
>
>       "Sometimes beating GNUGo on a tiny board" without human knowledge
>
> (much wow!)
>
>   Normally this would be a pretty weak result much but (A) I wanted to
> help calibrate other efforts on larger boards that are possibly still
> at the "random" stage, and (B) I'll probably move on to other projects
> again soon, so this might be as good as it gets for me.
>
>   I started the project by replacing MC simulations with a Keras model
> in my 550-line educational Go program Michi - it lived in its `nnet`
> branch until now when I separated it to a project on its own:
>
>       https://github.com/rossumai/nochi
>
> Starting from a small base means that the codebase is tiny and should be
> easy to follow, though it's not at all as tidy as Michi is.
>
> You can grab the current training state (== pickled archive of selfplay
> positions used for replay, chronological) and neural network weights
> from the github's "Releases" page:
>
>       https://github.com/rossumai/nochi/releases/tag/G171107T013304_000000150
>
>   This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
> no supervision, and it contains no Monte Carlo simulations or other
> heuristics. But it's not entirely 1:1, I did some tweaks which I thought
> might help early convergence:
>
>   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
>   * The neural network is updated after _every_ game, _twice_, on _all_
>     positions plus 64 randomly sampled positions from the entire history,
>     this all done four times - on original position and the three
>     symmetry flips (but I was too lazy to implement 90\deg rotation).
>   * Instead of supplying last 8 positions as the network input I feed
>     just the last position plus two indicator matrices showing
>     the location of the last and second-to-last move.
>   * No symmetry pruning during tree search.
>   * Value function is trained with cross-entropy rather than MSE,
>     no L2 regularization, and plain Adam rather than hand-tuned SGD (but
>     the annealing is reset time by time due to manual restarts of the
>     script from a checkpoint).
>   * No resign auto-threshold but it is important to play 25% games
>     without resigning to escale local "optima".
>   * 1/Temperature is 2 for first three moves.
>   * Initially I used 1000 "simulations" per move, but by mistake, last
>     1500 games when the network improved significantly (see below) were
>     run with 2000 simulations per move.  So that might matter.
>
>   This has been running for two weeks, self-playing 8500 games.  A week
> ago its moves already looked a bit natural but it was stuck in various
> local optima.  Three days ago it has beaten GNUGo once across 20 games.
> Now five times across 20 games - so I'll let it self-play a little longer
> as it might surpass GNUGo quickly at this point?  Also this late
> improvement coincides with the increased simulation number.
>
>   At the same time, Nochi supports supervised training (with the rest
> kept the same) which I'm now experimenting with on 19x19.
>
>   Happy training,
>


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

Reply via email to