Hi Ian, You obviously put a lot of thought and effort into this project. Here are some initial reactions (more might come later).
1. RE rollouts, you probably want to use the Python interface. Unfortunately I have been out of the loop for a long time, and can't remember the details. 2. I suggest you contact a statistician. I suspect the number of trials you need is quite large. The most important thing will be to establish a confidence interval for your equities. 3. I generated the early MET files for GNUBG, and dabbled a little in testing the effect of different tables. At the time I could not see a real difference in chequer play. So, while MET tables are theoretically interesting, personally I find it hard to believe it will make GNUBG stronger in practice. I would be delighted to be proven wrong. 4. So, if you are (say) confident with the results up to 5 (or 7), I would start by establishing the difference between the Variable MET and mec26 and/or XG for 5 or 7 point matches. This might be a good way to explore the issues. 5. I would like to see a short description of the Variable MET method, if you are willing to share it at this stage. Cheers, Joseph On Mon, 11 Nov 2019 at 01:20, Ian Dunstan <[email protected]> wrote: > Hi team, > > I am just finishing a project that has taken me many months (years), the > creation of a new backgammon MET. My nearly finished MET is called 'PR2' > and it is a combination of both rollouts and a theoretical MET. It uses > rollout trials (+500k) of all match scores less than 9a9a and then a > specially developed theoretical MET I call the 'Variable MET' to extend the > rollout results to 31a31a. The Variable MET could generate all of the 9a9a > MET probabilities in its own right, however, I use it in my MET as an > extrapolation tool. A lot of time and effort has gone into the accuracy of > both the 9a9a rollouts and the development of the Variable MET, more so the > latter. > > I could go into a lot more detail if you wish, however, what I would like > to do now is test my new PR2 MET and your help with 1) below is what I care > most about. Tony Lezard (of Dueller renown) suggested I contact the Gnubg > team after I asked him for help on testing. > > 1) What I would like to do is test my PR2 MET by doing a series of 5pt > matches where one Gnubg player uses the PR2 MET and the other the now > standard Kazaross-XG2 MET (in particular). My 'PR2' player faces himself in > 500 by 5pt matches at a time and the results are recorded. The moves I > don't care about, who won that 500 match series is all that is important. > I know that Gnubg can play itself now, however, not with different MET's > loaded and not without a lot of human input (every game end requires a > manual prompt from the user for the next game to begin). That way of doing > things is unworkable for me. What I need is a set-and-forget solution, > something I can start to do overnight and in the morning the match wins is > reported as something like (say) 257-243. > > I am only guessing how long 500 by 5pt matches would take me, even if > fully automated. Additionally, I do not know how many sets of 500 by 5pt > matches I would need to do to see a significant difference in METS. Maybe > 5000, 50000 or 500000. After seeing the difference in equity the PR2 MET > can sometimes produce I am hoping for the former. > > I have a friend who is a lot more computer savvy than I am and he has > started playing around with different sockets/ports and instances of Gnubg. > He tells me "You actually need 3 instances of gnubg running - I run all > three without the graphical interface, only pure terminal versions". > However, before he goes to too much trouble I thought it best to contact > the Gnubg team and see if you can help. > > Maybe you only have to change "a couple of lines of programming" as > someone on my forum suggested (lol). It won't be that easy, I know! > > 2) Jim Segrave thought this issue might be of interest to the team. > > https://i.postimg.cc/brQf7sVw/4a1a-C-seed-6987657-1036800.png > > There should not be any difference in the cubeless and cubeful results, > however, there is. I think the cubeless results are right and the cubeful > result discrepancy is due to some cubeful calculation drift. This > particular rollout shows the discrepancy near the 5th dp. In other rollouts > I did I believe the discrepancy crept into the 4th dp. > > My PR2 MET tries for accuracy to the second dp(%) in all of the 9a9a > entries I rolled. E.g. I have 1a2aC as 68.36% after compiling over a > million trials and that should be accurate to 2dp(%). > > Here is a further example. When I first rolled out 8a1a over 1 million > times in a single rollout I got a final cubeful result of ~0.10705. > However, I happened to be around my computer to watch the result at ~93% > completion and see the equity climb steadily from 0.10688 for over an hour > to reach 0.10705. So what, you may ask? Well, I have watched enough > rollouts to suppose that the 3rd decimal place if not the fourth should be > set in stone at nearly a million trials. Additionally, rollouts will have > the equity jumping up and down a bit due to variance, this rollout was not > doing that, equity just went up and up in this case. > > I was very suspicious so I then checked my 8a1a result by choosing 5 new > seeds and doing 5x12960 trial rollouts using the same Gnubg settings. I got: > > 0.1067726 > > 0.1068255 > > 0.1066774 > > 0.1066116 > > 0.1067552 > > The mean of these means is ~0.10672. > > In terms of a MET entry that would be 10.67% vs 10.71% for the million+ > rollout. 5x12960=64800 trials is not really a lot, however, I have done > enough rollouts to know something is probably wrong here. I repeated this > exercise with another million+ trial rollout vs 5x12960 trials. In this > second case, the 5x12960 results were all close to the mean 89.70% while > the million+ rollout was 89.45%. Again, very different and the million+ > trials are inaccurate in my opinion. > > I am guessing that there is some problem with the cubeful algorithm that > first creeps in at the 7th significant figure (sf), then migrates to the > 6th, 5th, 4th sf etc... all governed by the number of trials. For an > average user, they won't ever see a problem at 5184 trials or even 51840 > trials. However, I saw a problem with 518400 trials and above. At the time > of first seeing this issue, I abandoned the 25 x 1M+ trials I had done for > my MET project and started again. The way around this problem for me was to > do sets of 46656 trials and tabulate them carefully. > > An esoteric problem for sure and one that might be nearly irrelevant to > everyone except me. However, there might be an easy remedy that has to do > with increasing the number of sf used in Gnubg's cubeful algorithm(s). > > 3) Lastly, this is a small display problem to consider. > > Since in building a 31a31 MET I would check its extremities quite > regularly to see if I had the right PR2 MET version loaded and I noticed a > problem. There is a display problem at 23a31a where the equity for 25a31a > is shown instead. Incidentally, the 31a23a equity is correct in the Gnubg > table. You will not see a problem in the display of most of the MET's you > have loaded (probably all the default ones you have) since a calculation > will internally extrapolate results from ~15a15a (mec.c perhaps). My PR2 > MET is different, the extrapolation calculations Gnubg does for other MET's > do not start until after 31a31a. I think you have a small address problem > to fix. > > Kind regards, > > Ian Dunstan > (Australian Backgammon Federation Director) > >
