Hi again Philippe,

Did you find a way to show that the new net indeed is indeed more balanced
than the old with regard to the odd-even ply syndrome?

-Joseph


On 25 June 2012 07:37, Philippe Michel <[email protected]> wrote:

> On Sun, 24 Jun 2012, Joseph Heled wrote:
>
>  I am very interested to know how those nets were generated?
>>
>
> They were trained with your gnubg-nn tools, but from improved training
> data. This is basically how it went :
>
> I first tried to train the crashed net. Since it seemed one of its
> problems was dubious absolute equities in many positions and large
> discrepancies between even and odd evaluations, I used the original set of
> positions with the average of 3ply and 4ply evaluations.
>
> Early results looked promising but it didn't go very far, the 0ply errors
> going from :
> checkers 771 cube 1088 (total errors in the 0.90.0 net)
> to
> checkers 747 cube 753 to
> checkers 741 cube 776 (with the training set evaluated with the above net)
> to
> checkers 753 cube 787
>
> Checking the worse positions (worse as 3ply differing from 4ply), it was
> clear that if large differences went down from more that 4.0 in the old net
> to about 1.0, the equity given by a rollout were often close to either 3ply
> or 4ply, often outside the interval of these and taking the average wasn't
> converging.
>
> At this point I started to roll out the whole crashed training database
> (1296 trials, 0ply) using the 741/776 net. I used a slightly modified gnubg
> since gnubg-nn, not using SSE, would be much slower.
>
> Training from that led to a benchmark of checkers 766 cube 514.
>
> Then I looked at what I could change to the training set to improve
> checker play. Since it had been reported that he crashed net was bad at
> containment play and rolling outside primes, making bizarre stacks in the
> outfield, I started there.
>
> Looking at the training positions, I found quite many such positions,
> stacks of 7, 8 chekers on the 12 point, things like that. I tried to remove
> them, but since you added them in pairs, I tried hard to remove groups of
> related positions, not single ones. It was tedious and led only to minimal
> improvement. I gave up and left all the original positions.
>
> I then tried to add positions from rolling a prime from far away (playing
> out from something like Advanced Backgammon's position 127 with a varying
> number on men already off) against one or two checkers. I asked gnubg for
> its 0ply hint and if its 2 or 3 favorites looked wrong I added them, as
> well as my choice, to the training set. All in all, I added 700-800
> positions. This worked quite well, decreasing the checker error to the 730s.
>
> While investigating on these 1-checker containment positions, I had noted
> that the original training set was very unbalanced, with like 1500
> positions seen from the container and 30 from the runner. This is quite
> logical if most positions were added when 0ply and 2ply plays differ but I
> had the idea that the even/odd effect might be somehow related to this,
> especially for crashed positions where checker mobility, hence a training
> set more or less automatically generated, would be likely to be very
> asymetrical.
>
> To test this, I did the same full rollouts on the race training database
> as well as its positions with the other player on roll. It worked well,
> improving the 0ply benchmark a little and the 1ply one a lot. The swapped
> positions are not related pairs like most original ones but they still help.
>
> Redoing full rollouts on the crashed training set or even only on its
> swapped positions was going to take more time than I wished so I settled on
> doing a truncated rollout (324 trials, truncated at 8 ply then 2 ply
> evaluation) of the whole new set (original + inverted positions).
>
> Xavier Dufaure from XG had claimed in the Bgonline forum that its roller++
> evaluations (similar to the above truncated rollout) were generally about
> as good as a full rollout. Later experience makes me think this is not
> quite the case for gnubg (for starters, its equity estimates at the
> trucation point are not good enough). But this seemed like a decent
> compromise that I used the reevaluate the contact traing set (original +
> inverted positions) as well. My thoughts were that it would somehow diffuse
> the improved estimations from the crashed and race nets and the easy late
> positions deeper than a simple 2ply evaluation would.
>
> The weights attached to the earlier message are those resulting the above
> process. In summary :
> - full rollout of the original race database + inverted positions using
> the 0.90.0 net and train a race net from that
> - truncated rollout of the original crashed database + inverted positions
> + new containment positions using an intermediate crashed net and the above
> race net, and train from that
> - truncated rollout of the contact database + inverted positions using the
> 0.90.0 contact net and above crashed and race nets, and train from that
>
> After that I tried another pass of truncated rollouts on the contact and
> crashed training sets but it didn't improve the benchmarks (or maybe it did
> : this is when I realized the crashed benchmark was flawed, but the
> improvement, if any, looked like it would be minimal).
>
> Since I didn't see any obvious prospects for further quick improvements
> but the nets seemed to be worthwhile, I trained corresponding pruning nets
> and posted the weights files as they were.
>
> At this point, I'm looking at redoing full rollouts of the contact and
> crashed databases (1.8M positions!). I've done some tests on a few
> thousands of the smaller and larger pip counts. No surprise here : the
> former are fast but current data is already accurate, the latter take a lot
> of time but are quite often much more plausible than the current estimates.
>
_______________________________________________
Bug-gnubg mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-gnubg

Reply via email to