Hi,

Thanks for sharing your idea.

In my experience it is rarely efficient to train value functions from very 
short term data (ie, next move). TD(lambda), or training from the final outcome 
of the game is often better, because it uses a longer horizon. But of course, 
it is difficult to tell without experiments whether your idea would work or 
not. The advantage of your ideas is that you can collect a lot of training data 
more easily.

Rémi

----- Mail original -----
De: "Bo Peng" <b...@withablink.com>
À: computer-go@computer-go.org
Envoyé: Mardi 10 Janvier 2017 23:25:19
Objet: [Computer-go] Training the value network (a possibly more efficient 
approach)


Hi everyone. It occurs to me there might be a more efficient method to train 
the value network directly (without using the policy network). 


You are welcome to check my method: http://withablink.com/GoValueFunction.pdf 


Let me know if there is any silly mistakes :) 

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to