Also (if I'm understanding the paper correctly) 20 blocks ~= 40 layers because each "block" has two convolution layers:
Each residual block applies the following modules sequentially to its input: > (1) A convolution of 256 filters of kernel size 3×3 with stride 1 > (2) Batch normalization > (3) A rectifier nonlinearity > (4) A convolution of 256 filters of kernel size 3×3 with stride 1 > (5) Batch normalization > (6) A skip connection that adds the input to the block > (7) A rectifier nonlinearity On Tue, Oct 24, 2017 at 5:10 PM, Xavier Combelle <xavier.combe...@gmail.com> wrote: > How is it a fair comparison if there is only 3 days of training for Zero ? > Master had longer training no ? Moreover, Zero has bootstrap problem > because at the opposite of Master it don't learn from expert games > which means that it is likely to be weaker with little training. > > > Le 24/10/2017 à 20:20, Hideki Kato a écrit : > > David Silver told Master used 40 layers network in May. > > According to new paper, Master used the same architecture > > as Zero. So, Master used 20 blocks ResNet. > > > > The first instance of Zero, 20 blocks ResNet version, is > > weaker than Master (after 3 days training). So, with the > > same layers (a fair comparison) Zero is weaker than > > Master. > > > > Hideki > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go