I have been interested in a different approach, and it had some elements in 
common with AGZ, so AGZ gave me the confidence to try it.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Chaz G.
Sent: Sunday, December 3, 2017 4:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

Hi Brian,

 

Thanks for sharing your genuinely interesting result. One question though: why 
would you train on a non-"zero" program? Do you think your program as a result 
of your rules would perform better than zero, or is it imitating the best known 
algorithm inconvenient for your purposes?

 

Best,

-Chaz

 

On Sat, Dec 2, 2017 at 7:31 PM, Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> > wrote:

I implemented the ad hoc rule of not training on positions after the first 
pass, and my program is basically playing moves until the first pass is forced. 
(It is not a “zero” program, so I don’t mind ad hoc rules like this.)

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org 
<mailto:computer-go-boun...@computer-go.org> ] On Behalf Of Xavier Combelle
Sent: Saturday, December 2, 2017 12:36 PM
To: computer-go@computer-go.org <mailto:computer-go@computer-go.org> 


Subject: Re: [Computer-go] Significance of resignation in AGZ

 

It might make sense to enable resignation threshold even on stupid level. As 
such the first thing the network should learn would be not to resign to early 
(even before not passing)

 

Le 02/12/2017 à 18:17, Brian Sheppard via Computer-go a écrit :

I have some hard data now. My network’s initial training reached the same 
performance in half the iterations. That is, the steepness of skill gain in the 
first day of training was twice as great when I avoided training on fill-ins.

 

The has all the usual caveats: only one run before/after, YMMV, etc.

 

From: Brian Sheppard [mailto:sheppar...@aol.com] 
Sent: Friday, December 1, 2017 5:39 PM
To: 'computer-go'  <mailto:computer-go@computer-go.org> 
<computer-go@computer-go.org>
Subject: RE: [Computer-go] Significance of resignation in AGZ

 

I didn’t measure precisely because as soon as I saw the training artifacts I 
changed the code. And I am not doing an AGZ-style experiment, so there are 
differences for sure. So I will give you a swag…

 

Speed difference is maybe 20%-ish for 9x9 games.

 

A frequentist approach will overstate the frequency of fill-in plays by a 
pretty large factor, because fill-in plays are guaranteed to occur in every 
game but are not best in the competitive part of the game. This will affect the 
speed of learning in the early going.

 

The network will use some fraction (almost certainly <= 20%) of its capacity to 
improve accuracy on positions that will not contribute to its ultimate 
strength. This applies to both ordering and evaluation aspects.

 

 

 

 

From: Andy [mailto:andy.olsen...@gmail.com] 
Sent: Friday, December 1, 2017 4:55 PM
To: Brian Sheppard  <mailto:sheppar...@aol.com> <sheppar...@aol.com>; 
computer-go  <mailto:computer-go@computer-go.org> <computer-go@computer-go.org>
Subject: Re: [Computer-go] Significance of resignation in AGZ

 

Brian, do you have any experiments showing what kind of impact it has? It 
sounds like you have tried both with and without your ad hoc first pass 
approach?

 

 

 

 

2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go 
<computer-go@computer-go.org <mailto:computer-go@computer-go.org> >:

I have concluded that AGZ's policy of resigning "lost" games early is somewhat 
significant. Not as significant as using residual networks, for sure, but you 
wouldn't want to go without these advantages.

The benefit cited in the paper is speed. Certainly a factor. I see two other 
advantages.

First is that training does not include the "fill in" portion of the game, 
where every move is low value. I see a specific effect on the move ordering 
system, since it is based on frequency. By eliminating training on fill-ins, 
the prioritization function will not be biased toward moves that are not 
relevant to strong play. (That is, there are a lot of fill-in moves, which are 
usually not best in the interesting portion of the game, but occur a lot if the 
game is played out to the end, and therefore the move prioritization system 
would predict them more often.) My ad hoc alternative is to not train on 
positions after the first pass in a game. (Note that this does not qualify as 
"zero knowledge", but that is OK with me since I am not trying to reproduce 
AGZ.)

Second is the positional evaluation is not training on situations where 
everything is decided, so less of the NN capacity is devoted to situations in 
which nothing can be gained.

As always, YMMV.

Best,
Brian


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 





_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

 

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to