Re: [Computer-go] Training the value network (a possibly more efficient approach)

John Tromp Tue, 10 Jan 2017 15:20:44 -0800

hi Bo,

> Let me know if there is any silly mistakes :)


You say "the perfect policy network can be
derived from the perfect value network (the best next move is the move
that maximises the value for the player, if the value function is
perfect), but not vice versa.", but a perfect policy for both players
can be used to generate a perfect playout which yields the perfect
value...

regards,
-John
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Reply via email to