You make me really curious, what is a Keras model ? Le 10/11/2017 à 01:47, Petr Baudis a écrit : > Hi, > > I got first *somewhat* positive results in my attempt to reproduce > AlphaGo Zero - 25% winrate against GNUGo on the easiest reasonable task > - 7x7 board. :) a.k.a. > > "Sometimes beating GNUGo on a tiny board" without human knowledge > > (much wow!) > > Normally this would be a pretty weak result much but (A) I wanted to > help calibrate other efforts on larger boards that are possibly still > at the "random" stage, and (B) I'll probably move on to other projects > again soon, so this might be as good as it gets for me. > > I started the project by replacing MC simulations with a Keras model > in my 550-line educational Go program Michi - it lived in its `nnet` > branch until now when I separated it to a project on its own: > > https://github.com/rossumai/nochi > > Starting from a small base means that the codebase is tiny and should be > easy to follow, though it's not at all as tidy as Michi is. > > You can grab the current training state (== pickled archive of selfplay > positions used for replay, chronological) and neural network weights > from the github's "Releases" page: > > https://github.com/rossumai/nochi/releases/tag/G171107T013304_000000150 > > This is a truly "zero-knowledge" system like AlphaGo Zero - it needs > no supervision, and it contains no Monte Carlo simulations or other > heuristics. But it's not entirely 1:1, I did some tweaks which I thought > might help early convergence: > > * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7. > * The neural network is updated after _every_ game, _twice_, on _all_ > positions plus 64 randomly sampled positions from the entire history, > this all done four times - on original position and the three > symmetry flips (but I was too lazy to implement 90\deg rotation). > * Instead of supplying last 8 positions as the network input I feed > just the last position plus two indicator matrices showing > the location of the last and second-to-last move. > * No symmetry pruning during tree search. > * Value function is trained with cross-entropy rather than MSE, > no L2 regularization, and plain Adam rather than hand-tuned SGD (but > the annealing is reset time by time due to manual restarts of the > script from a checkpoint). > * No resign auto-threshold but it is important to play 25% games > without resigning to escale local "optima". > * 1/Temperature is 2 for first three moves. > * Initially I used 1000 "simulations" per move, but by mistake, last > 1500 games when the network improved significantly (see below) were > run with 2000 simulations per move. So that might matter. > > This has been running for two weeks, self-playing 8500 games. A week > ago its moves already looked a bit natural but it was stuck in various > local optima. Three days ago it has beaten GNUGo once across 20 games. > Now five times across 20 games - so I'll let it self-play a little longer > as it might surpass GNUGo quickly at this point? Also this late > improvement coincides with the increased simulation number. > > At the same time, Nochi supports supervised training (with the rest > kept the same) which I'm now experimenting with on 19x19. > > Happy training, >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go