Re: [Computer-go] NiceGoZero games during learning

2017-11-06 Thread uurtamo .
I think it'd be quite interesting to at least keep track of the winrate
over the 4d version until then (although I recognize it will be zero for
some time). Maybe when it wins one?

s.

On Nov 6, 2017 6:00 PM, "Detlef Schmicker"  wrote:

> Not in this weak state of the learned net. I measure with a net trained
> from 4d+ kgs games right now on CGOS (NG-learn-ref).
>
> This should be the line, which could be beaten by Zero after enough
> learning. If I manage to beat this version (I check every learning cycle
> 10 games against this version) than I will probably also measure the
> strength of this, but I think this will take some weeks:)
>
>
> Am 06.11.2017 um 17:05 schrieb uurtamo .:
> > Detlef,
> >
> > I misunderstand your last sentence. Do you mean that eventually you'll
> put
> > a subset of functioning nets on CGOS to measure how quickly their
> strength
> > is improving?
> >
> > s.
> >
> > On Nov 6, 2017 4:54 PM, "Detlef Schmicker"  wrote:
> >
> >> I thought it might be fun to have some games in early stage of learning
> >> from nearly Zero knowledge.
> >>
> >> I did not turn off the (relatively weak) playouts and mix them with 30%
> >> into the result from the value network. I started at an initial random
> >> neural net (small one, about 4ms on GTX970) and use a relatively wide
> >> search for MC (much much wider, than I do for good playing strength,
> >> unpruning about 5-6 moves) and 100 playouts expanding every 3 playouts,
> >> thus 33 network evaluations per move.
> >>
> >> Additionally I add Gaussian random numbers with a standard derivation of
> >> 0.02 to the policy network.
> >>
> >> With this setup I play 1000 games and do an reinforcement learning cycle
> >> with them. One cycle takes me about 5 hours.
> >>
> >> The first 2 days I did not archive games, than I noticed it might be fun
> >> having games from the training history: now I always archive one game
> >> per cycle.
> >>
> >>
> >> Here are some games ...
> >>
> >>
> >> http://physik.de/games_during_learning/
> >>
> >>
> >> I will probably add some more games, if I have them and will try to
> >> measure, how strong the bot is with exactly this (weak broad search )
> >> configuration but a pretrained net from 4d+ kgs games on CGOS...
> >>
> >>
> >> Detlef
> >> ___
> >> Computer-go mailing list
> >> Computer-go@computer-go.org
> >> http://computer-go.org/mailman/listinfo/computer-go
> >
> >
> >
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> >
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] NiceGoZero games during learning

2017-11-06 Thread Detlef Schmicker
Not in this weak state of the learned net. I measure with a net trained
from 4d+ kgs games right now on CGOS (NG-learn-ref).

This should be the line, which could be beaten by Zero after enough
learning. If I manage to beat this version (I check every learning cycle
10 games against this version) than I will probably also measure the
strength of this, but I think this will take some weeks:)


Am 06.11.2017 um 17:05 schrieb uurtamo .:
> Detlef,
> 
> I misunderstand your last sentence. Do you mean that eventually you'll put
> a subset of functioning nets on CGOS to measure how quickly their strength
> is improving?
> 
> s.
> 
> On Nov 6, 2017 4:54 PM, "Detlef Schmicker"  wrote:
> 
>> I thought it might be fun to have some games in early stage of learning
>> from nearly Zero knowledge.
>>
>> I did not turn off the (relatively weak) playouts and mix them with 30%
>> into the result from the value network. I started at an initial random
>> neural net (small one, about 4ms on GTX970) and use a relatively wide
>> search for MC (much much wider, than I do for good playing strength,
>> unpruning about 5-6 moves) and 100 playouts expanding every 3 playouts,
>> thus 33 network evaluations per move.
>>
>> Additionally I add Gaussian random numbers with a standard derivation of
>> 0.02 to the policy network.
>>
>> With this setup I play 1000 games and do an reinforcement learning cycle
>> with them. One cycle takes me about 5 hours.
>>
>> The first 2 days I did not archive games, than I noticed it might be fun
>> having games from the training history: now I always archive one game
>> per cycle.
>>
>>
>> Here are some games ...
>>
>>
>> http://physik.de/games_during_learning/
>>
>>
>> I will probably add some more games, if I have them and will try to
>> measure, how strong the bot is with exactly this (weak broad search )
>> configuration but a pretrained net from 4d+ kgs games on CGOS...
>>
>>
>> Detlef
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
> 
> 
> 
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
> 
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] NiceGoZero games during learning

2017-11-06 Thread Detlef Schmicker
I thought it might be fun to have some games in early stage of learning
from nearly Zero knowledge.

I did not turn off the (relatively weak) playouts and mix them with 30%
into the result from the value network. I started at an initial random
neural net (small one, about 4ms on GTX970) and use a relatively wide
search for MC (much much wider, than I do for good playing strength,
unpruning about 5-6 moves) and 100 playouts expanding every 3 playouts,
thus 33 network evaluations per move.

Additionally I add Gaussian random numbers with a standard derivation of
0.02 to the policy network.

With this setup I play 1000 games and do an reinforcement learning cycle
with them. One cycle takes me about 5 hours.

The first 2 days I did not archive games, than I noticed it might be fun
having games from the training history: now I always archive one game
per cycle.


Here are some games ...


http://physik.de/games_during_learning/


I will probably add some more games, if I have them and will try to
measure, how strong the bot is with exactly this (weak broad search )
configuration but a pretrained net from 4d+ kgs games on CGOS...


Detlef
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go