On Fri, Nov 10, 2017 at 03:40:27PM +0100, Gian-Carlo Pascutto wrote:
> On 10/11/2017 1:47, Petr Baudis wrote:
>
> > * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
>
> How many filters per layer?
256 like AlphaGo.
> FWIW 7 layer resnet (14 + 2 layers) is still pretty huge - larger than
> the initial AlphaGo. Given the amount of games you have, and the size of
> the board, I would not be surprised if your neural net program is
> "outbooking" the opponent by remembering the sequences rather than
> learning more generic things.
>
> (But hey, outbooking is learning too!)
I couldn't exclude this, yes. It would be interesting to try to use the
same convolutions on a bigger board to see if they play shapes and can do
basic tactics.
> > * The neural network is updated after _every_ game, _twice_, on _all_
> > positions plus 64 randomly sampled positions from the entire history,
> > this all done four times - on original position and the three
> > symmetry flips (but I was too lazy to implement 90\deg rotation).
>
> The reasoning being to give a stronger and faster reinforcement with the
> latest data?
Yes.
> > * Value function is trained with cross-entropy rather than MSE,
> > no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> > the annealing is reset time by time due to manual restarts of the
> > script from a checkpoint).
>
> I never really had good results with Adam and friends compared to SGD
> (even momentum does not always help - but of course it's much faster
> early on).
It has worked great on all my neural models in other tasks - but this is
actually my first neural model for Go. :)
> > * No resign auto-threshold but it is important to play 25% games
> > without resigning to escale local "optima".
>
> This makes sense because both sides will miscount in exactly the same way.
Without this, producing value 1.0 for one color and 0.0 for the other
is a super-strong attractor.
> > * 1/Temperature is 2 for first three moves.
> > * Initially I used 1000 "simulations" per move, but by mistake, last
> > 1500 games when the network improved significantly (see below) were
> > run with 2000 simulations per move. So that might matter.
> >
> > This has been running for two weeks, self-playing 8500 games. A week
> > ago its moves already looked a bit natural but it was stuck in various
> > local optima. Three days ago it has beaten GNUGo once across 20 games.
> > Now five times across 20 games - so I'll let it self-play a little longer
> > as it might surpass GNUGo quickly at this point? Also this late
> > improvement coincides with the increased simulation number.
>
> The simulation number if one of the big black boxes in this setup, I
> think. If the policy network does not have a strong opinion yet, it
> seems that one has to make it sufficiently bigger than the amount of
> legal moves. If not, first-play-urgency will cause every successor
> position to be evaluated and there's no look ahead, which means MCTS
> can't discover anything.
I don't see how first-play-urgency comes into play. Initially it'll be
typically noise but that still means growing the tree pretty
asymmetrically. I saw uniform sampling only in some cases when number
of simulations was << number of children.
> So a few times 361 makes sense for 19x19, but don't ask me why 1600 and
> not 1200 etc.
My feeling now is that especially slightly later on raising the count
really helps. I think the moment is when you stop seeing regular, large
discrepancies between network predictions and scoring output in very
late endgame. But it could be an illusion.
> With only 50-ish moves to consider on 7x7, it's interesting that you see
> a big improvement by making it (relatively) much larger than DeepMind did.
>
> But uh, you're not simply matching it against GNUGo with more
> simulations are you? I mean it would be quite normal to win more when
> searching deeper.
All playtests should have been with 2000 simulations.
--
Petr Baudis, Rossum
Run before you walk! Fly before you crawl! Keep moving forward!
If we fail, I'd rather fail really hugely. -- Moist von Lipwig
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go