On 10/11/2017 1:47, Petr Baudis wrote: > * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
How many filters per layer? FWIW 7 layer resnet (14 + 2 layers) is still pretty huge - larger than the initial AlphaGo. Given the amount of games you have, and the size of the board, I would not be surprised if your neural net program is "outbooking" the opponent by remembering the sequences rather than learning more generic things. (But hey, outbooking is learning too!) > * The neural network is updated after _every_ game, _twice_, on _all_ > positions plus 64 randomly sampled positions from the entire history, > this all done four times - on original position and the three > symmetry flips (but I was too lazy to implement 90\deg rotation). The reasoning being to give a stronger and faster reinforcement with the latest data? > * Value function is trained with cross-entropy rather than MSE, > no L2 regularization, and plain Adam rather than hand-tuned SGD (but > the annealing is reset time by time due to manual restarts of the > script from a checkpoint). I never really had good results with Adam and friends compared to SGD (even momentum does not always help - but of course it's much faster early on). > * No resign auto-threshold but it is important to play 25% games > without resigning to escale local "optima". This makes sense because both sides will miscount in exactly the same way. > * 1/Temperature is 2 for first three moves. > * Initially I used 1000 "simulations" per move, but by mistake, last > 1500 games when the network improved significantly (see below) were > run with 2000 simulations per move. So that might matter. > > This has been running for two weeks, self-playing 8500 games. A week > ago its moves already looked a bit natural but it was stuck in various > local optima. Three days ago it has beaten GNUGo once across 20 games. > Now five times across 20 games - so I'll let it self-play a little longer > as it might surpass GNUGo quickly at this point? Also this late > improvement coincides with the increased simulation number. The simulation number if one of the big black boxes in this setup, I think. If the policy network does not have a strong opinion yet, it seems that one has to make it sufficiently bigger than the amount of legal moves. If not, first-play-urgency will cause every successor position to be evaluated and there's no look ahead, which means MCTS can't discover anything. So a few times 361 makes sense for 19x19, but don't ask me why 1600 and not 1200 etc. With only 50-ish moves to consider on 7x7, it's interesting that you see a big improvement by making it (relatively) much larger than DeepMind did. But uh, you're not simply matching it against GNUGo with more simulations are you? I mean it would be quite normal to win more when searching deeper. -- GCP _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go