Hi Brian, Thanks for sharing your genuinely interesting result. One question though: why would you train on a non-"zero" program? Do you think your program as a result of your rules would perform better than zero, or is it imitating the best known algorithm inconvenient for your purposes?
Best, -Chaz On Sat, Dec 2, 2017 at 7:31 PM, Brian Sheppard via Computer-go < computer-go@computer-go.org> wrote: > I implemented the ad hoc rule of not training on positions after the first > pass, and my program is basically playing moves until the first pass is > forced. (It is not a “zero” program, so I don’t mind ad hoc rules like > this.) > > > > *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On > Behalf Of *Xavier Combelle > *Sent:* Saturday, December 2, 2017 12:36 PM > *To:* computer-go@computer-go.org > > *Subject:* Re: [Computer-go] Significance of resignation in AGZ > > > > It might make sense to enable resignation threshold even on stupid level. > As such the first thing the network should learn would be not to resign to > early (even before not passing) > > > > Le 02/12/2017 à 18:17, Brian Sheppard via Computer-go a écrit : > > I have some hard data now. My network’s initial training reached the same > performance in half the iterations. That is, the steepness of skill gain in > the first day of training was twice as great when I avoided training on > fill-ins. > > > > The has all the usual caveats: only one run before/after, YMMV, etc. > > > > *From:* Brian Sheppard [mailto:sheppar...@aol.com <sheppar...@aol.com>] > *Sent:* Friday, December 1, 2017 5:39 PM > *To:* 'computer-go' <computer-go@computer-go.org> > <computer-go@computer-go.org> > *Subject:* RE: [Computer-go] Significance of resignation in AGZ > > > > I didn’t measure precisely because as soon as I saw the training artifacts > I changed the code. And I am not doing an AGZ-style experiment, so there > are differences for sure. So I will give you a swag… > > > > Speed difference is maybe 20%-ish for 9x9 games. > > > > A frequentist approach will overstate the frequency of fill-in plays by a > pretty large factor, because fill-in plays are guaranteed to occur in every > game but are not best in the competitive part of the game. This will affect > the speed of learning in the early going. > > > > The network will use some fraction (almost certainly <= 20%) of its > capacity to improve accuracy on positions that will not contribute to its > ultimate strength. This applies to both ordering and evaluation aspects. > > > > > > > > > > *From:* Andy [mailto:andy.olsen...@gmail.com <andy.olsen...@gmail.com>] > *Sent:* Friday, December 1, 2017 4:55 PM > *To:* Brian Sheppard <sheppar...@aol.com> <sheppar...@aol.com>; > computer-go <computer-go@computer-go.org> <computer-go@computer-go.org> > *Subject:* Re: [Computer-go] Significance of resignation in AGZ > > > > Brian, do you have any experiments showing what kind of impact it has? It > sounds like you have tried both with and without your ad hoc first pass > approach? > > > > > > > > > > 2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go < > computer-go@computer-go.org>: > > I have concluded that AGZ's policy of resigning "lost" games early is > somewhat significant. Not as significant as using residual networks, for > sure, but you wouldn't want to go without these advantages. > > The benefit cited in the paper is speed. Certainly a factor. I see two > other advantages. > > First is that training does not include the "fill in" portion of the game, > where every move is low value. I see a specific effect on the move ordering > system, since it is based on frequency. By eliminating training on > fill-ins, the prioritization function will not be biased toward moves that > are not relevant to strong play. (That is, there are a lot of fill-in > moves, which are usually not best in the interesting portion of the game, > but occur a lot if the game is played out to the end, and therefore the > move prioritization system would predict them more often.) My ad hoc > alternative is to not train on positions after the first pass in a game. > (Note that this does not qualify as "zero knowledge", but that is OK with > me since I am not trying to reproduce AGZ.) > > Second is the positional evaluation is not training on situations where > everything is decided, so less of the NN capacity is devoted to situations > in which nothing can be gained. > > As always, YMMV. > > Best, > Brian > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > > > > > > > _______________________________________________ > > Computer-go mailing list > > Computer-go@computer-go.org > > http://computer-go.org/mailman/listinfo/computer-go > > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go