Re: [Computer-go] Training a "Score Network" in Monte-Carlo Tree Search

2017-03-19 Thread Bo Peng
lding@home / mining bitcoins? Otherwise individuals / small groups won't have any chance against large companies. On 3/20/17, 03:48, "Computer-go on behalf of Bo Peng" <computer-go-boun...@computer-go.org on behalf of b...@withablink.com> wrote: >Training a policy network is simpl

[Computer-go] Training a "Score Network" in Monte-Carlo Tree Search

2017-03-19 Thread Bo Peng
Training a policy network is simple and I have found a Residual Network with Batch Normalization works very well. However training a value network is far more challenging as I have found it indeed very easy to have overfitting, unless one uses the final territory as another prediction target. Even

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
Hi, >How do you get the V(s) for those datasets? You play out the endgame >with the Monte Carlo playouts? > >I think one problem with this approach is that errors in the data for >V(s) directly correlate to errors in MC playouts. So a large benefit of >"mixing" the two (otherwise independent)

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
Hi zakki, > I couldn't get positive experiment results on Ray. > Rn's network structure of V and W are similar and share parameters, > but only final convolutional layer are different. > I trained Rn's network to minimize MSE of V(s) + W(s). > It uses only KGS and GoGoD data sets, no self play

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
It¹s nice to see so many discussions. Another reason could be that training a good quality v(s) (or V(s)) may require some different network structures from that of W(s). Usually it is helpful to have an ensemble of different networks, each constructed from different principles. On 1/11/17,

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
er, because it uses a longer >horizon. But of course, it is difficult to tell without experiments >whether your idea would work or not. The advantage of your ideas is that >you can collect a lot of training data more easily. > >Rémi > >- Mail original - >De: "Bo Peng&quo

[Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng
Hi John, >You say "the perfect policy network can be >derived from the perfect value network (the best next move is the move >that maximises the value for the player, if the value function is >perfect), but not vice versa.", but a perfect policy for both players >can be used to generate a perfect

[Computer-go] Training the value network (a possibly more efficient approach)

2017-01-10 Thread Bo Peng
Hi everyone. It occurs to me there might be a more efficient method to train the value network directly (without using the policy network). You are welcome to check my method: http://withablink.com/GoValueFunction.pdf Let me know if there is any silly mistakes :)