lding@home /
mining bitcoins? Otherwise individuals / small groups won't have any
chance against large companies.
On 3/20/17, 03:48, "Computer-go on behalf of Bo Peng"
<computer-go-boun...@computer-go.org on behalf of b...@withablink.com> wrote:
>Training a policy network is simpl
Training a policy network is simple and I have found a Residual Network
with Batch Normalization works very well. However training a value network
is far more challenging as I have found it indeed very easy to have
overfitting, unless one uses the final territory as another prediction
target. Even
Hi,
>How do you get the V(s) for those datasets? You play out the endgame
>with the Monte Carlo playouts?
>
>I think one problem with this approach is that errors in the data for
>V(s) directly correlate to errors in MC playouts. So a large benefit of
>"mixing" the two (otherwise independent)
Hi zakki,
> I couldn't get positive experiment results on Ray.
> Rn's network structure of V and W are similar and share parameters,
> but only final convolutional layer are different.
> I trained Rn's network to minimize MSE of V(s) + W(s).
> It uses only KGS and GoGoD data sets, no self play
It¹s nice to see so many discussions.
Another reason could be that training a good quality v(s) (or V(s)) may
require some different network structures from that of W(s).
Usually it is helpful to have an ensemble of different networks, each
constructed from different principles.
On 1/11/17,
er, because it uses a longer
>horizon. But of course, it is difficult to tell without experiments
>whether your idea would work or not. The advantage of your ideas is that
>you can collect a lot of training data more easily.
>
>Rémi
>
>- Mail original -
>De: "Bo Peng&quo
Hi John,
>You say "the perfect policy network can be
>derived from the perfect value network (the best next move is the move
>that maximises the value for the player, if the value function is
>perfect), but not vice versa.", but a perfect policy for both players
>can be used to generate a perfect
Hi everyone. It occurs to me there might be a more efficient method to train
the value network directly (without using the policy network).
You are welcome to check my method: http://withablink.com/GoValueFunction.pdf
Let me know if there is any silly mistakes :)