Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

Álvaro Begué Fri, 10 Nov 2017 05:43:05 -0800

It's a model written using the Keras neural network library:
https://en.wikipedia.org/wiki/Keras



On Fri, Nov 10, 2017 at 7:09 AM, Xavier Combelle <xavier.combe...@gmail.com>
wrote:

> You make me really curious,
> what is a Keras model ?
>
> Le 10/11/2017 à 01:47, Petr Baudis a écrit :
> >   Hi,
> >
> >   I got first *somewhat* positive results in my attempt to reproduce
> > AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task
> > - 7x7 board. :)  a.k.a.
> >
> >       "Sometimes beating GNUGo on a tiny board" without human knowledge
> >
> > (much wow!)
> >
> >   Normally this would be a pretty weak result much but (A) I wanted to
> > help calibrate other efforts on larger boards that are possibly still
> > at the "random" stage, and (B) I'll probably move on to other projects
> > again soon, so this might be as good as it gets for me.
> >
> >   I started the project by replacing MC simulations with a Keras model
> > in my 550-line educational Go program Michi - it lived in its `nnet`
> > branch until now when I separated it to a project on its own:
> >
> >       https://github.com/rossumai/nochi
> >
> > Starting from a small base means that the codebase is tiny and should be
> > easy to follow, though it's not at all as tidy as Michi is.
> >
> > You can grab the current training state (== pickled archive of selfplay
> > positions used for replay, chronological) and neural network weights
> > from the github's "Releases" page:
> >
> >       https://github.com/rossumai/nochi/releases/tag/
> G171107T013304_000000150
> >
> >   This is a truly "zero-knowledge" system like AlphaGo Zero - it needs
> > no supervision, and it contains no Monte Carlo simulations or other
> > heuristics. But it's not entirely 1:1, I did some tweaks which I thought
> > might help early convergence:
> >
> >   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
> >   * The neural network is updated after _every_ game, _twice_, on _all_
> >     positions plus 64 randomly sampled positions from the entire history,
> >     this all done four times - on original position and the three
> >     symmetry flips (but I was too lazy to implement 90\deg rotation).
> >   * Instead of supplying last 8 positions as the network input I feed
> >     just the last position plus two indicator matrices showing
> >     the location of the last and second-to-last move.
> >   * No symmetry pruning during tree search.
> >   * Value function is trained with cross-entropy rather than MSE,
> >     no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> >     the annealing is reset time by time due to manual restarts of the
> >     script from a checkpoint).
> >   * No resign auto-threshold but it is important to play 25% games
> >     without resigning to escale local "optima".
> >   * 1/Temperature is 2 for first three moves.
> >   * Initially I used 1000 "simulations" per move, but by mistake, last
> >     1500 games when the network improved significantly (see below) were
> >     run with 2000 simulations per move.  So that might matter.
> >
> >   This has been running for two weeks, self-playing 8500 games.  A week
> > ago its moves already looked a bit natural but it was stuck in various
> > local optima.  Three days ago it has beaten GNUGo once across 20 games.
> > Now five times across 20 games - so I'll let it self-play a little longer
> > as it might surpass GNUGo quickly at this point?  Also this late
> > improvement coincides with the increased simulation number.
> >
> >   At the same time, Nochi supports supervised training (with the rest
> > kept the same) which I'm now experimenting with on 19x19.
> >
> >   Happy training,
> >
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

Reply via email to