Re: [Computer-go] Training the value network (a possibly more efficient approach)

Bo Peng Wed, 11 Jan 2017 07:23:08 -0800

Hi,

>How do you get the V(s) for those datasets? You play out the endgame
>with the Monte Carlo playouts?
>
>I think one problem with this approach is that errors in the data for
>V(s) directly correlate to errors in MC playouts. So a large benefit of
>"mixing" the two (otherwise independent) evaluations is lost.


Yes, that is a problem for Human games dataset.

On the other hand, currently the SL part is relatively easier (it seems
everyone arrives at a 50-60% accuracy), and the main challenges of the RL
part is generating the huge number of self-play games.

In self-play games we have an accurate end-game v(s) / V(s). And v(s) /
V(s) is able to use the information in self-play games more efficiently. I
think this can be helpful.
>


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Reply via email to