So I am reading that residual networks are simply better than normal 
convolutional networks. There is a detailed write-up here: 
https://blog.waya.ai/deep-residual-learning-9610bb62c355

Summary: the residual network has a fixed connection that adds (with no 
scaling) the output of the previous level to the output of the current level. 
The point is that once some layer learns a concept, that concept is immediately 
available to all downstream layers, without need for learning how to propagate 
the value through a complicated network design. These connections also provide 
a fast pathway for tuning deeper layers.

-----Original Message-----
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 4:33 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero

On 18/10/2017 19:50, cazen...@ai.univ-paris8.fr wrote:
> 
> https://deepmind.com/blog/
> 
> http://www.nature.com/nature/index.html

Select quotes that I find interesting from a brief skim:

1) Using a residual network was more accurate, achieved lower error, and 
improved performance in AlphaGo by over 600 Elo.

2) Combining policy and value together into a single network slightly reduced 
the move prediction accuracy, but reduced the value error and boosted playing 
performance in AlphaGo by around another 600 Elo.

These gains sound very high (much higher than previous experiments with them 
reported here), but are likely due to the joint training.

3) The raw neural network, without using any lookahead, achieved an Elo rating 
of 3,055. ... AlphaGo Zero achieved a rating of 5,185.

The increase of 2000 Elo from tree search sounds very high, but this may just 
mean the value network is simply very good - and perhaps relatively better than 
the policy one. (They previously had problems there that SL
> RL for the policy network guiding the tree search - but I'm not sure
there's any relation)

4) History features Xt; Yt are necessary because Go is not fully observable 
solely from the current stones, as repetitions are forbidden.

This is a weird statement. Did they need 17 planes just to check for ko?
It seems more likely that history features are very helpful for the internal 
understanding of the network as an optimization. That sucks though - it's 
annoying for analysis and position setup.

Lastly, the entire training procedure is actually not very complicated at all, 
and it's hopeful the training is "faster" than previous approaches - but many 
things look fast if you can throw 64 GPU workers at a problem.

In this context, the graphs of the differing network architectures causing huge 
strength discrepancies are both good and bad. Making a better pick can cause 
you to get massively better results, take a bad pick and you won't come close.

--
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to