Re: [Computer-go] Training the value network (a possibly more efficient approach)

Gian-Carlo Pascutto Wed, 11 Jan 2017 05:59:18 -0800

On 11-01-17 14:33, Kensuke Matsuzaki wrote:
> Hi,
> 
> I couldn't get positive experiment results on Ray.
>  
> Rn's network structure of V and W are similar and share parameters,
> but only final convolutional layer are different.
> I trained Rn's network to minimize MSE of V(s) + W(s).
> It uses only KGS and GoGoD data sets, no self play with RL policy.


How do you get the V(s) for those datasets? You play out the endgame
with the Monte Carlo playouts?

I think one problem with this approach is that errors in the data for
V(s) directly correlate to errors in MC playouts. So a large benefit of
"mixing" the two (otherwise independent) evaluations is lost.

This problem doesn't exist when using raw W/L data from those datasets,
or when using SL/RL playouts. (But note that using the full engine to
produce games *would* suffer from the same correlation. That might be
entirely offset by the higher quality of the data, though.)

> But I have no idea about how to use V(s) or v(s) in MCTS.

V(s) seems potentially useful for handicap games where W(s) is no longer
accurate. I don't see any benefit over W(s) for even games.

-- 
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Reply via email to