Re: [Computer-go] Training the value network (a possibly more efficient approach)

Bo Peng Wed, 11 Jan 2017 06:46:04 -0800

Hi zakki,

> I couldn't get positive experiment results on Ray.
> Rn's network structure of V and W are similar and share parameters,
> but only final convolutional layer are different.
> I trained Rn's network to minimize MSE of V(s) + W(s).
> It uses only KGS and GoGoD data sets, no self play with RL policy.



Thanks for sharing your results.

Have you tried more stages of training V, in which the second method in my
PDF is also used (i.e. Train the value network to fit the "observed move",
as I feel it could improve the "awareness / sharpness" of V).

Bo


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Reply via email to