Is there some particular reason AGZ uses two 1x1 filters for the policy
head instead of one?

They could also have allowed more, but I guess that would be expensive? I
calculate that the fully connected layer has 2*361*362 weights, where 2 is
the number of filters.

By comparison the value head has only a single 1x1 filter, but it goes to a
hidden layer of 256. That gives 1*361*256 weights. Why not use two 1x1
filters here? Maybe since the final output is only a single scalar it's not
needed?
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to