from:"Bo Peng"

Re: [Computer-go] Training a "Score Network" in Monte-Carlo Tree Search

2017-03-19 Thread Bo Peng

lding@home / mining bitcoins? Otherwise individuals / small groups won't have any chance against large companies. On 3/20/17, 03:48, "Computer-go on behalf of Bo Peng" <computer-go-boun...@computer-go.org on behalf of b...@withablink.com> wrote: >Training a policy network is simpl

[Computer-go] Training a "Score Network" in Monte-Carlo Tree Search

2017-03-19 Thread Bo Peng

Training a policy network is simple and I have found a Residual Network with Batch Normalization works very well. However training a value network is far more challenging as I have found it indeed very easy to have overfitting, unless one uses the final territory as another prediction target. Even

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng

Hi, >How do you get the V(s) for those datasets? You play out the endgame >with the Monte Carlo playouts? > >I think one problem with this approach is that errors in the data for >V(s) directly correlate to errors in MC playouts. So a large benefit of >"mixing" the two (otherwise independent)

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng

Hi zakki, > I couldn't get positive experiment results on Ray. > Rn's network structure of V and W are similar and share parameters, > but only final convolutional layer are different. > I trained Rn's network to minimize MSE of V(s) + W(s). > It uses only KGS and GoGoD data sets, no self play

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng

It¹s nice to see so many discussions. Another reason could be that training a good quality v(s) (or V(s)) may require some different network structures from that of W(s). Usually it is helpful to have an ensemble of different networks, each constructed from different principles. On 1/11/17,

Re: [Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng

er, because it uses a longer >horizon. But of course, it is difficult to tell without experiments >whether your idea would work or not. The advantage of your ideas is that >you can collect a lot of training data more easily. > >Rémi > >- Mail original - >De: "Bo Peng&quo

[Computer-go] Training the value network (a possibly more efficient approach)

2017-01-11 Thread Bo Peng

Hi John, >You say "the perfect policy network can be >derived from the perfect value network (the best next move is the move >that maximises the value for the player, if the value function is >perfect), but not vice versa.", but a perfect policy for both players >can be used to generate a perfect

[Computer-go] Training the value network (a possibly more efficient approach)

2017-01-10 Thread Bo Peng

Hi everyone. It occurs to me there might be a more efficient method to train the value network directly (without using the policy network). You are welcome to check my method: http://withablink.com/GoValueFunction.pdf Let me know if there is any silly mistakes :)

Re: [Computer-go] Training a "Score Network" in Monte-Carlo Tree Search

[Computer-go] Training a "Score Network" in Monte-Carlo Tree Search

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Re: [Computer-go] Training the value network (a possibly more efficient approach)

[Computer-go] Training the value network (a possibly more efficient approach)

[Computer-go] Training the value network (a possibly more efficient approach)

8 matches

Site Navigation

Mail list logo

Footer information