I also wonder about this. A purely convolutional approach would save a lot of weights. The output for pass can be set to be a single bias parameter, connected to nothing. Setting pass to a constant might work, too. I don't understand the reason for such a complication.
----- Mail original ----- De: "Andy" <andy.olsen...@gmail.com> À: "computer-go" <computer-go@computer-go.org> Envoyé: Vendredi 29 Décembre 2017 06:47:06 Objet: [Computer-go] AGZ Policy Head Is there some particular reason AGZ uses two 1x1 filters for the policy head instead of one? They could also have allowed more, but I guess that would be expensive? I calculate that the fully connected layer has 2*361*362 weights, where 2 is the number of filters. By comparison the value head has only a single 1x1 filter, but it goes to a hidden layer of 256. That gives 1*361*256 weights. Why not use two 1x1 filters here? Maybe since the final output is only a single scalar it's not needed? _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go