Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-12 Thread Xavier Combelle
So I will start to create software, and if someone want to use it you will be free as free software, and I already found someone who is ready to host the server side. From a practical point of view, I will use public key signing to distribute go software (binary or source), so I will ask the

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-12 Thread Gian-Carlo Pascutto
On 11-01-17 18:09, Xavier Combelle wrote: > Of course it means distribute at least the binary so, or the source, > so proprietary software could be reluctant to share it. But for free > software there should not any problem. If someone is interested by my > proposition, I would be pleased to

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Xavier Combelle
Le 11/01/2017 à 16:14, Bo Peng a écrit : > Hi, > >> How do you get the V(s) for those datasets? You play out the endgame >> with the Monte Carlo playouts? >> >> I think one problem with this approach is that errors in the data for >> V(s) directly correlate to errors in MC playouts. So a large

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
Hi, >How do you get the V(s) for those datasets? You play out the endgame >with the Monte Carlo playouts? > >I think one problem with this approach is that errors in the data for >V(s) directly correlate to errors in MC playouts. So a large benefit of >"mixing" the two (otherwise independent)

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Kensuke Matsuzaki
Hi, How do you get the V(s) for those datasets? You play out the endgame > with the Monte Carlo playouts? > Yes, I use result of 100 playout from the endgame. Sometimes the result stored in sgf differs from result of playouts. zakki ___ Computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
Hi zakki, > I couldn't get positive experiment results on Ray. > Rn's network structure of V and W are similar and share parameters, > but only final convolutional layer are different. > I trained Rn's network to minimize MSE of V(s) + W(s). > It uses only KGS and GoGoD data sets, no self play

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
It¹s nice to see so many discussions. Another reason could be that training a good quality v(s) (or V(s)) may require some different network structures from that of W(s). Usually it is helpful to have an ensemble of different networks, each constructed from different principles. On 1/11/17,

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Gian-Carlo Pascutto
On 10-01-17 23:25, Bo Peng wrote: > Hi everyone. It occurs to me there might be a more efficient method to > train the value network directly (without using the policy network). > > You are welcome to check my > method: http://withablink.com/GoValueFunction.pdf > For Method 1 you state:

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Gian-Carlo Pascutto
On 11-01-17 14:33, Kensuke Matsuzaki wrote: > Hi, > > I couldn't get positive experiment results on Ray. > > Rn's network structure of V and W are similar and share parameters, > but only final convolutional layer are different. > I trained Rn's network to minimize MSE of V(s) + W(s). > It uses

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Kensuke Matsuzaki
émi > > > >----- Mail original - > >De: "Bo Peng" <b...@withablink.com> > >À: computer-go@computer-go.org > >Envoyé: Mardi 10 Janvier 2017 23:25:19 > >Objet: [Computer-go] Training the value network (a possibly more > >efficient approach)

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
t; <b...@withablink.com> >À: computer-go@computer-go.org >Envoyé: Mardi 10 Janvier 2017 23:25:19 >Objet: [Computer-go] Training the value network (a possibly more >efficient approach) > > >Hi everyone. It occurs to me there might be a more efficient method to >train t

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Rémi Coulom
Objet: [Computer-go] Training the value network (a possibly more efficient approach) Hi everyone. It occurs to me there might be a more efficient method to train the value network directly (without using the policy network). You are welcome to check my method: http://withablink.com/GoValueFun

[Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
Hi John, >You say "the perfect policy network can be >derived from the perfect value network (the best next move is the move >that maximises the value for the player, if the value function is >perfect), but not vice versa.", but a perfect policy for both players >can be used to generate a perfect

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-10 Thread Brian Sheppard
[mailto:computer-go-boun...@computer-go.org] On Behalf Of Bo Peng Sent: Tuesday, January 10, 2017 5:25 PM To: computer-go@computer-go.org Subject: [Computer-go] Training the value network (a possibly more efficient approach) Hi everyone. It occurs to me there might be a more efficient method

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-10 Thread John Tromp
hi Bo, > Let me know if there is any silly mistakes :) You say "the perfect policy network can be derived from the perfect value network (the best next move is the move that maximises the value for the player, if the value function is perfect), but not vice versa.", but a perfect policy for both

[Computer-go] Training the value network (a possibly more efficient approach)

2017-01-10 Thread Bo Peng
Hi everyone. It occurs to me there might be a more efficient method to train the value network directly (without using the policy network). You are welcome to check my method: http://withablink.com/GoValueFunction.pdf Let me know if there is any silly mistakes :)