Is there some particular reason AGZ uses two 1x1 filters for the policy head instead of one?
They could also have allowed more, but I guess that would be expensive? I calculate that the fully connected layer has 2*361*362 weights, where 2 is the number of filters. By comparison the value head has only a single 1x1 filter, but it goes to a hidden layer of 256. That gives 1*361*256 weights. Why not use two 1x1 filters here? Maybe since the final output is only a single scalar it's not needed?
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go