Yes, I think the important thing of the value function is to detect moves that are very bad so that MC-eval does not have to sample more than once for many variations.

If the evaluation function was trained on pro moves only, it would not know what a bad move looks like. At least the evaluation function would not be able to see thee difference between "very bad", "never good" and "sometimes possible".

Magnus

On 2016-11-21 15:22, Gian-Carlo Pascutto wrote:
For the Value Network indeed the procedure is as described, with one
move at time U being uniformly sampled from {1,361} until it is legal. I
think it's because we're not interested (only) in playing good moves,
but also analyzing as diverse as possible positions to learn whether
they're won or lost. Throwing in one totally random move vastly
increases the diversity and the number of odd positions the network
sees, while still not leading to totally nonsensical positions.
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to