It's a model written using the Keras neural network library: https://en.wikipedia.org/wiki/Keras
On Fri, Nov 10, 2017 at 7:09 AM, Xavier Combelle <xavier.combe...@gmail.com> wrote: > You make me really curious, > what is a Keras model ? > > Le 10/11/2017 à 01:47, Petr Baudis a écrit : > > Hi, > > > > I got first *somewhat* positive results in my attempt to reproduce > > AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task > > - 7x7 board. :) a.k.a. > > > > "Sometimes beating GNUGo on a tiny board" without human knowledge > > > > (much wow!) > > > > Normally this would be a pretty weak result much but (A) I wanted to > > help calibrate other efforts on larger boards that are possibly still > > at the "random" stage, and (B) I'll probably move on to other projects > > again soon, so this might be as good as it gets for me. > > > > I started the project by replacing MC simulations with a Keras model > > in my 550-line educational Go program Michi - it lived in its `nnet` > > branch until now when I separated it to a project on its own: > > > > https://github.com/rossumai/nochi > > > > Starting from a small base means that the codebase is tiny and should be > > easy to follow, though it's not at all as tidy as Michi is. > > > > You can grab the current training state (== pickled archive of selfplay > > positions used for replay, chronological) and neural network weights > > from the github's "Releases" page: > > > > https://github.com/rossumai/nochi/releases/tag/ > G171107T013304_000000150 > > > > This is a truly "zero-knowledge" system like AlphaGo Zero - it needs > > no supervision, and it contains no Monte Carlo simulations or other > > heuristics. But it's not entirely 1:1, I did some tweaks which I thought > > might help early convergence: > > > > * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7. > > * The neural network is updated after _every_ game, _twice_, on _all_ > > positions plus 64 randomly sampled positions from the entire history, > > this all done four times - on original position and the three > > symmetry flips (but I was too lazy to implement 90\deg rotation). > > * Instead of supplying last 8 positions as the network input I feed > > just the last position plus two indicator matrices showing > > the location of the last and second-to-last move. > > * No symmetry pruning during tree search. > > * Value function is trained with cross-entropy rather than MSE, > > no L2 regularization, and plain Adam rather than hand-tuned SGD (but > > the annealing is reset time by time due to manual restarts of the > > script from a checkpoint). > > * No resign auto-threshold but it is important to play 25% games > > without resigning to escale local "optima". > > * 1/Temperature is 2 for first three moves. > > * Initially I used 1000 "simulations" per move, but by mistake, last > > 1500 games when the network improved significantly (see below) were > > run with 2000 simulations per move. So that might matter. > > > > This has been running for two weeks, self-playing 8500 games. A week > > ago its moves already looked a bit natural but it was stuck in various > > local optima. Three days ago it has beaten GNUGo once across 20 games. > > Now five times across 20 games - so I'll let it self-play a little longer > > as it might surpass GNUGo quickly at this point? Also this late > > improvement coincides with the increased simulation number. > > > > At the same time, Nochi supports supervised training (with the rest > > kept the same) which I'm now experimenting with on 19x19. > > > > Happy training, > > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go