Hi zakki,

> I couldn't get positive experiment results on Ray.
> Rn's network structure of V and W are similar and share parameters,
> but only final convolutional layer are different.
> I trained Rn's network to minimize MSE of V(s) + W(s).
> It uses only KGS and GoGoD data sets, no self play with RL policy.


Thanks for sharing your results.

Have you tried more stages of training V, in which the second method in my
PDF is also used (i.e. Train the value network to fit the "observed move",
as I feel it could improve the "awareness / sharpness" of V).

Bo


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to