Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

Gian-Carlo Pascutto Fri, 10 Nov 2017 07:08:27 -0800

On 10/11/2017 1:47, Petr Baudis wrote:

>   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.


How many filters per layer?

FWIW 7 layer resnet (14 + 2 layers) is still pretty huge - larger than
the initial AlphaGo. Given the amount of games you have, and the size of
the board, I would not be surprised if your neural net program is
"outbooking" the opponent by remembering the sequences rather than
learning more generic things.

(But hey, outbooking is learning too!)

>   * The neural network is updated after _every_ game, _twice_, on _all_
>     positions plus 64 randomly sampled positions from the entire history,
>     this all done four times - on original position and the three
>     symmetry flips (but I was too lazy to implement 90\deg rotation).

The reasoning being to give a stronger and faster reinforcement with the
latest data?

>   * Value function is trained with cross-entropy rather than MSE,
>     no L2 regularization, and plain Adam rather than hand-tuned SGD (but
>     the annealing is reset time by time due to manual restarts of the
>     script from a checkpoint).

I never really had good results with Adam and friends compared to SGD
(even momentum does not always help - but of course it's much faster
early on).

>   * No resign auto-threshold but it is important to play 25% games
>     without resigning to escale local "optima".

This makes sense because both sides will miscount in exactly the same way.

>   * 1/Temperature is 2 for first three moves.
>   * Initially I used 1000 "simulations" per move, but by mistake, last
>     1500 games when the network improved significantly (see below) were
>     run with 2000 simulations per move.  So that might matter.
> 
>   This has been running for two weeks, self-playing 8500 games.  A week
> ago its moves already looked a bit natural but it was stuck in various
> local optima.  Three days ago it has beaten GNUGo once across 20 games.
> Now five times across 20 games - so I'll let it self-play a little longer
> as it might surpass GNUGo quickly at this point?  Also this late
> improvement coincides with the increased simulation number.

The simulation number if one of the big black boxes in this setup, I
think. If the policy network does not have a strong opinion yet, it
seems that one has to make it sufficiently bigger than the amount of
legal moves. If not, first-play-urgency will cause every successor
position to be evaluated and there's no look ahead, which means MCTS
can't discover anything.

So a few times 361 makes sense for 19x19, but don't ask me why 1600 and
not 1200 etc.

With only 50-ish moves to consider on 7x7, it's interesting that you see
a big improvement by making it (relatively) much larger than DeepMind did.

But uh, you're not simply matching it against GNUGo with more
simulations are you? I mean it would be quite normal to win more when
searching deeper.

-- 
GCP
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

Reply via email to