On Sat, Jan 23, 2021 at 5:34 AM Darren Cook <dar...@dcook.org> wrote:

> Each convolutional layer should spread the information across the board.
> I think alpha zero used 20 layers? So even 3x3 filters would tell you
> about the whole board - though the signal from the opposite corner of
> the board might end up a bit weak.
>
> I think we can assume it is doing that successfully, because otherwise
> we'd hear about it losing lots of games in ladders.
>

Unfortunately, we can't assume that based on that observation.

If you observe what is going on with both Leela Zero and ELF, and MiniGo
and SAI as well - all of which are reproductions of AlphaZero with
different hyperparameters and infrastructure that do not include a ladder
feature, I think you can find *all* of them have at least some trouble with
ladders. So this is empirical evidence that the vanilla AlphaZero algorithm
when applied to Go with a convolutional resnet, often has ladder problems.

And by seeing how these reproductions behave, it also becomes clear how
your observation can still be true at the same time.

Which is: with enough playouts, for all these bots MCTS is able to solve
ladders well enough at the root position and the upper levels of the tree
to avoid losing outright - usually a few tens of thousands of playouts are
plenty. So it just affects the strength by causing harm to the evaluation
quality deeper in the tree in ways that are harder to see. The kind of
thing that might cost you more like 20-50 Elo (pure guess, just my
intuition for the *very* rough order of magnitude with this much search on
top), rather than losing you every game.

The bigger problem happens when you try any of these bots on weaker
hardware that only gets few playouts - low-end GPUs, mobile hardware, etc.
for example.... *or the numbers of playouts that people often run CGOS bots
with*, namely 200 playouts, or 800 playouts, etc. You will find that they
are still clearly top-pro-level or superhuman at almost all aspects of the
game... except for ladders! And now at these low numbers of playouts, it
does include outright losing games due to ladders, or making major
misjudgments about a sequence that will depend on a ladder in 1-3 moves in
the future.

Sometimes, this even happens in the low-thousands of playouts. For example,
attached SGF shows such a case, where Leela Zero using almost the latest
40-block network (LZ285) with 2k playouts per move (plus tree reuse)
attempted to break a ladder, failed, and then played out the ladder anyways
and lost on the spot.

It is also true that neural nets *are* capable of learning judgments
related to ladders given the right data. Some time back, I found with some
visualizations for KataGo's net that it actually is tracing a width-6
diagonal band across the board from ladders! But the inductive bias is weak
enough, plus the structure of the game tree for ladders is hard (it's like
the classic "cliff walking" problem in RL turned up to the max), that it's
a chicken-and-egg problem. Starting from a net that doesn't understand
ladders yet, the "MCTS policy/value-improvement operator" is empirically
very poor at bootstrapping the net into understanding them.


> > something the first version of AlphaGo did (before they tried to make it
> > "zero") and something that many other bots do as well. But Leela Zero and
> > ELF do not do this, because of attempting to remain "zero", ...
>
> I know that zero-ness was very important to DeepMind, but I thought the
> open source dedicated go bots that have copied it did so because AlphaGo
> Zero was stronger than AlphaGo Master after 21-40 days of training.
> I.e. in the rarefied atmosphere of super-human play that starter package
> of human expert knowledge was considered a weight around its neck.
>

The PR and public press around AlphaZero may give one this impression
generally - it certainly sounds like a more impressive discovery if not
only can you learn from Zero, but doing so is actually better! But I'm
confident that this is not true in general, and that it also depends on
what "expert knowledge" you add, and how you add it.

You may note that the AlphaGo Zero paper makes no mention of how long or
with how many TPUs AlphaGo Master was trained (or if it does, I can't find
it) - so it's hard to say what Master vs Zero shows. Also, it claims that
AlphaGo Master still made use of handcrafted Monte-Carlo rollouts, which I
can easily believe that jettisoning could lead to a big improvement. And
it's at least plausible to me that not-pretraining on human pro games might
give better final results (*but* this is unclear - at least I don't know of
any paper that actually runs this as a controlled test)..

But there are other bits of "expert knowledge" that do provide an
improvement over being pure-zero if done correctly, including:
* Predicting the final ownership of the board, not just the win/loss.
* Adding a small/mild term for caring about score, rather than just
win/loss.
* Seeding a percentage of the self-play training games to start in
positions based on external or expert-supplied game or board positions
(this is the main way KataGo went from being highly vulnerable to
MiYuting's flying dagger like other zero bots, to playing it decently well
and now often winning games based on it depending on if the other side
happens to shoot themselves in the foot with one of the trap variations or
not).

And yes, for now it also includes:
* Adding ladder status as an input to the neural net.

Attachment: 0_94.sgf
Description: Binary data

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to