Hi again Philippe, Did you find a way to show that the new net indeed is indeed more balanced than the old with regard to the odd-even ply syndrome?
-Joseph On 25 June 2012 07:37, Philippe Michel <[email protected]> wrote: > On Sun, 24 Jun 2012, Joseph Heled wrote: > > I am very interested to know how those nets were generated? >> > > They were trained with your gnubg-nn tools, but from improved training > data. This is basically how it went : > > I first tried to train the crashed net. Since it seemed one of its > problems was dubious absolute equities in many positions and large > discrepancies between even and odd evaluations, I used the original set of > positions with the average of 3ply and 4ply evaluations. > > Early results looked promising but it didn't go very far, the 0ply errors > going from : > checkers 771 cube 1088 (total errors in the 0.90.0 net) > to > checkers 747 cube 753 to > checkers 741 cube 776 (with the training set evaluated with the above net) > to > checkers 753 cube 787 > > Checking the worse positions (worse as 3ply differing from 4ply), it was > clear that if large differences went down from more that 4.0 in the old net > to about 1.0, the equity given by a rollout were often close to either 3ply > or 4ply, often outside the interval of these and taking the average wasn't > converging. > > At this point I started to roll out the whole crashed training database > (1296 trials, 0ply) using the 741/776 net. I used a slightly modified gnubg > since gnubg-nn, not using SSE, would be much slower. > > Training from that led to a benchmark of checkers 766 cube 514. > > Then I looked at what I could change to the training set to improve > checker play. Since it had been reported that he crashed net was bad at > containment play and rolling outside primes, making bizarre stacks in the > outfield, I started there. > > Looking at the training positions, I found quite many such positions, > stacks of 7, 8 chekers on the 12 point, things like that. I tried to remove > them, but since you added them in pairs, I tried hard to remove groups of > related positions, not single ones. It was tedious and led only to minimal > improvement. I gave up and left all the original positions. > > I then tried to add positions from rolling a prime from far away (playing > out from something like Advanced Backgammon's position 127 with a varying > number on men already off) against one or two checkers. I asked gnubg for > its 0ply hint and if its 2 or 3 favorites looked wrong I added them, as > well as my choice, to the training set. All in all, I added 700-800 > positions. This worked quite well, decreasing the checker error to the 730s. > > While investigating on these 1-checker containment positions, I had noted > that the original training set was very unbalanced, with like 1500 > positions seen from the container and 30 from the runner. This is quite > logical if most positions were added when 0ply and 2ply plays differ but I > had the idea that the even/odd effect might be somehow related to this, > especially for crashed positions where checker mobility, hence a training > set more or less automatically generated, would be likely to be very > asymetrical. > > To test this, I did the same full rollouts on the race training database > as well as its positions with the other player on roll. It worked well, > improving the 0ply benchmark a little and the 1ply one a lot. The swapped > positions are not related pairs like most original ones but they still help. > > Redoing full rollouts on the crashed training set or even only on its > swapped positions was going to take more time than I wished so I settled on > doing a truncated rollout (324 trials, truncated at 8 ply then 2 ply > evaluation) of the whole new set (original + inverted positions). > > Xavier Dufaure from XG had claimed in the Bgonline forum that its roller++ > evaluations (similar to the above truncated rollout) were generally about > as good as a full rollout. Later experience makes me think this is not > quite the case for gnubg (for starters, its equity estimates at the > trucation point are not good enough). But this seemed like a decent > compromise that I used the reevaluate the contact traing set (original + > inverted positions) as well. My thoughts were that it would somehow diffuse > the improved estimations from the crashed and race nets and the easy late > positions deeper than a simple 2ply evaluation would. > > The weights attached to the earlier message are those resulting the above > process. In summary : > - full rollout of the original race database + inverted positions using > the 0.90.0 net and train a race net from that > - truncated rollout of the original crashed database + inverted positions > + new containment positions using an intermediate crashed net and the above > race net, and train from that > - truncated rollout of the contact database + inverted positions using the > 0.90.0 contact net and above crashed and race nets, and train from that > > After that I tried another pass of truncated rollouts on the contact and > crashed training sets but it didn't improve the benchmarks (or maybe it did > : this is when I realized the crashed benchmark was flawed, but the > improvement, if any, looked like it would be minimal). > > Since I didn't see any obvious prospects for further quick improvements > but the nets seemed to be worthwhile, I trained corresponding pruning nets > and posted the weights files as they were. > > At this point, I'm looking at redoing full rollouts of the contact and > crashed databases (1.8M positions!). I've done some tests on a few > thousands of the smaller and larger pip counts. No surprise here : the > former are fast but current data is already accurate, the latter take a lot > of time but are quite often much more plausible than the current estimates. >
_______________________________________________ Bug-gnubg mailing list [email protected] https://lists.gnu.org/mailman/listinfo/bug-gnubg
