Also (if I'm understanding the paper correctly) 20 blocks ~= 40 layers
because each "block" has two convolution layers:

Each residual block applies the following modules sequentially to its input:
> (1) A convolution of 256 filters of kernel size 3×3 with stride 1
> (2) Batch normalization
> (3) A rectifier nonlinearity
> (4) A convolution of 256 filters of kernel size 3×3 with stride 1
> (5) Batch normalization
> (6) A skip connection that adds the input to the block
> (7) A rectifier nonlinearity


On Tue, Oct 24, 2017 at 5:10 PM, Xavier Combelle <xavier.combe...@gmail.com>
wrote:

> How is it a fair comparison if there is only 3 days of training for Zero ?
> Master had longer training no ? Moreover, Zero has bootstrap problem
> because at the opposite of Master it don't learn from expert games
> which means that it is likely to be weaker with little training.
>
>
> Le 24/10/2017 à 20:20, Hideki Kato a écrit :
> > David Silver told Master used 40 layers network in May.
> > According to new paper, Master used the same architecture
> > as Zero.  So, Master used 20 blocks ResNet.
> >
> > The first instance of Zero, 20 blocks ResNet version, is
> > weaker than Master (after 3 days training).  So, with the
> > same layers (a fair comparison) Zero is weaker than
> > Master.
> >
> > Hideki
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to