Re: [Computer-go] AlphaGo Zero

Joona Kiiski Wed, 18 Oct 2017 17:26:55 -0700

About the fact that ladders appeared so late:

- The learning was based on self-play. Understanding ladders is perhaps not
so important if your opponent doesn't understand them either... Every time
a decisive ladder appears on the board, the result is practically a coin
toss.


- And as others have pointed out, unlike almost all other go features,
ladders are not at all a local feature. The features need to build up
through a huge number of convolution layers, before it works. And it's
difficult to build this understanding incrementally (unlike e.g. life &
death where you can start with simple cases and then move to more difficult
cases), so we lack bias to direct the learning to the right direction.

On Wed, Oct 18, 2017 at 3:04 PM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> Some thoughts toward the idea of general game-playing...
>
> One aspect of Go is ideally suited for visual NN: strong locality of
> reference.  That is, stones affect stones that are nearby.
>
> I wonder whether the late emergence of ladder understanding within AlphaGo
> Zero is an artifact of the board representation? The authors speculate that
> ladders are not as important as humans surmise.
>
> Another aspect of Go is ideally suited for visual NN: translation
> invariance. The convolutions transfer knowledge around the board, with the
> presumption that good moves will travel well.
>
> I wonder whether we can data-mine positional evaluations to discover
> features? E.g., start with a standard visual NN and then make a database of
> positions where the delta between actual and expected evaluation is large
> enough to cause a different move to be selected. Data mine features from
> this set, and extend the NN with new inputs. (This would not discover a
> notion like "liberty count", but would discover notions like "ladder
> breaker".)
>
>
> -----Original Message-----
> From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf
> Of Gian-Carlo Pascutto
> Sent: Wednesday, October 18, 2017 4:33 PM
> To: computer-go@computer-go.org
> Subject: Re: [Computer-go] AlphaGo Zero
>
> On 18/10/2017 19:50, cazen...@ai.univ-paris8.fr wrote:
> >
> > https://deepmind.com/blog/
> >
> > http://www.nature.com/nature/index.html
>
> Select quotes that I find interesting from a brief skim:
>
> 1) Using a residual network was more accurate, achieved lower error, and
> improved performance in AlphaGo by over 600 Elo.
>
> 2) Combining policy and value together into a single network slightly
> reduced the move prediction accuracy, but reduced the value error and
> boosted playing performance in AlphaGo by around another 600 Elo.
>
> These gains sound very high (much higher than previous experiments with
> them reported here), but are likely due to the joint training.
>
> 3) The raw neural network, without using any lookahead, achieved an Elo
> rating of 3,055. ... AlphaGo Zero achieved a rating of 5,185.
>
> The increase of 2000 Elo from tree search sounds very high, but this may
> just mean the value network is simply very good - and perhaps relatively
> better than the policy one. (They previously had problems there that SL
> > RL for the policy network guiding the tree search - but I'm not sure
> there's any relation)
>
> 4) History features Xt; Yt are necessary because Go is not fully
> observable solely from the current stones, as repetitions are forbidden.
>
> This is a weird statement. Did they need 17 planes just to check for ko?
> It seems more likely that history features are very helpful for the
> internal understanding of the network as an optimization. That sucks though
> - it's annoying for analysis and position setup.
>
> Lastly, the entire training procedure is actually not very complicated at
> all, and it's hopeful the training is "faster" than previous approaches -
> but many things look fast if you can throw 64 GPU workers at a problem.
>
> In this context, the graphs of the differing network architectures causing
> huge strength discrepancies are both good and bad. Making a better pick can
> cause you to get massively better results, take a bad pick and you won't
> come close.
>
> --
> GCP
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

Reply via email to