So I will start to create software, and if someone want to use it you
will be free as free software, and I already found someone
who is ready to host the server side.
From a practical point of view, I will use public key signing to
distribute go software (binary or source), so I will ask the
On 11-01-17 18:09, Xavier Combelle wrote:
> Of course it means distribute at least the binary so, or the source,
> so proprietary software could be reluctant to share it. But for free
> software there should not any problem. If someone is interested by my
> proposition, I would be pleased to
Le 11/01/2017 à 16:14, Bo Peng a écrit :
> Hi,
>
>> How do you get the V(s) for those datasets? You play out the endgame
>> with the Monte Carlo playouts?
>>
>> I think one problem with this approach is that errors in the data for
>> V(s) directly correlate to errors in MC playouts. So a large
Hi,
>How do you get the V(s) for those datasets? You play out the endgame
>with the Monte Carlo playouts?
>
>I think one problem with this approach is that errors in the data for
>V(s) directly correlate to errors in MC playouts. So a large benefit of
>"mixing" the two (otherwise independent)
Hi,
How do you get the V(s) for those datasets? You play out the endgame
> with the Monte Carlo playouts?
>
Yes, I use result of 100 playout from the endgame.
Sometimes the result stored in sgf differs from result of playouts.
zakki
___
Computer-go
Hi zakki,
> I couldn't get positive experiment results on Ray.
> Rn's network structure of V and W are similar and share parameters,
> but only final convolutional layer are different.
> I trained Rn's network to minimize MSE of V(s) + W(s).
> It uses only KGS and GoGoD data sets, no self play
It¹s nice to see so many discussions.
Another reason could be that training a good quality v(s) (or V(s)) may
require some different network structures from that of W(s).
Usually it is helpful to have an ensemble of different networks, each
constructed from different principles.
On 1/11/17,
On 10-01-17 23:25, Bo Peng wrote:
> Hi everyone. It occurs to me there might be a more efficient method to
> train the value network directly (without using the policy network).
>
> You are welcome to check my
> method: http://withablink.com/GoValueFunction.pdf
>
For Method 1 you state:
On 11-01-17 14:33, Kensuke Matsuzaki wrote:
> Hi,
>
> I couldn't get positive experiment results on Ray.
>
> Rn's network structure of V and W are similar and share parameters,
> but only final convolutional layer are different.
> I trained Rn's network to minimize MSE of V(s) + W(s).
> It uses
émi
> >
> >----- Mail original -
> >De: "Bo Peng" <b...@withablink.com>
> >À: computer-go@computer-go.org
> >Envoyé: Mardi 10 Janvier 2017 23:25:19
> >Objet: [Computer-go] Training the value network (a possibly more
> >efficient approach)
t; <b...@withablink.com>
>À: computer-go@computer-go.org
>Envoyé: Mardi 10 Janvier 2017 23:25:19
>Objet: [Computer-go] Training the value network (a possibly more
>efficient approach)
>
>
>Hi everyone. It occurs to me there might be a more efficient method to
>train t
Objet: [Computer-go] Training the value network (a possibly more efficient
approach)
Hi everyone. It occurs to me there might be a more efficient method to train
the value network directly (without using the policy network).
You are welcome to check my method: http://withablink.com/GoValueFun
Hi John,
>You say "the perfect policy network can be
>derived from the perfect value network (the best next move is the move
>that maximises the value for the player, if the value function is
>perfect), but not vice versa.", but a perfect policy for both players
>can be used to generate a perfect
[mailto:computer-go-boun...@computer-go.org] On Behalf Of Bo
Peng
Sent: Tuesday, January 10, 2017 5:25 PM
To: computer-go@computer-go.org
Subject: [Computer-go] Training the value network (a possibly more efficient
approach)
Hi everyone. It occurs to me there might be a more efficient method
hi Bo,
> Let me know if there is any silly mistakes :)
You say "the perfect policy network can be
derived from the perfect value network (the best next move is the move
that maximises the value for the player, if the value function is
perfect), but not vice versa.", but a perfect policy for both
Hi everyone. It occurs to me there might be a more efficient method to train
the value network directly (without using the policy network).
You are welcome to check my method: http://withablink.com/GoValueFunction.pdf
Let me know if there is any silly mistakes :)
16 matches
Mail list logo