Hi all, there has been a looong thread on bgonline (seems to be down right now) about the right manner to present rollout results.
Gnubg shows the so called "JSDs": the idea behind is that for a JSD of 1.96 between two plays there's 95% chance for the play with higher rollout equity being really the one with higher equity (there some inaccuracy in the above sentence, but the concept is there). Remark #1: Why don't we show the % instead of the JSD ? It's much more reasonable. This will mean that at the end of a rollout, we'll have something like Play 1 +0.650 95.5% Play 2 +0.630 4.5% Play 3 +0.600 2.5% Notice that the percentage shown aside the top play is the "confidence"we have in it being better than the 2nd best play. For the other plays, the percentage shown is the confidence we have in that play being better than the top play. Percentages for play 1 and 2 must add to 100%. Showing the result this was looks nicer and more easily understandable. Also, we should allow to enter percentages intead of JSDs as stopping criteria. Remark #2: Confidence intervals of play1 vs playN does nt give you the right picture: most of the time what you want to know is the chances of any play being better than *all* the others. One easy approach is this: for each play you have an equity and a stdev. You assume the true equity of each play to have a normal distribution and you do Nmc MonteCarlo trials. For each trial, you sample the N normal distributions and you find the highest value. You add 1 to the total of the play with the highest value. At the end, you divide each play total by Nmc and this gives you the chances of each play being best. The above is just MonteCarlo integration of multiple normal distributions. Only things you need are erf (error function, part of glib) and its inverse (not part of glib but part, for example, of Gnu Scientific library, or you can find many decent C/C++ implementations around). 50K montecarlo trials are enough, depending on how fast it is, you can do more. It would be very easy to integrate this in gnubg and to compute the values at the end of the rollout (and show them aside the confidence interval ones). More ambitious (and probably unnecessary) would be to do the MC stuff at the end of each trial and use its results as stopping criteria for the rollout. But this may slow down rollouts (TBC, see remark #4 below). Remark #3: Another criteria that could be used to stop the rollout is the product of all the confidence of play1 vs playN. In my example above, play1 better than play2 with confidence 95.5%, play1 vs play3 confidence (100-2.5) = 97.5%, hence overall confidnce of play 1 being best = 95.5% * 97.5% = 93.1%. This value is in fact a lower bound (it assumes that the events 'p1 better than p2" and "p1 better than p3" are uncorrelated, which is wrong). Combined with the confidence of play1 vs play2 (which is an upper bound), they give a reasonable interval of the true probability at almost no effort. Remark #4: Tim Chow has proposed a bayesian approach which seems by far the right conceptual way to answer the question "how sure are we that play1 is best". However, this methods too slow downs rollouts (you have to execute steps at each trial) and does not provide any major advantage over the montecarlo method (which can be executed only once at the end of the rollout, if not used as stopping criteria). In my *extremely* naive python implementation however, the bayes method was much faster than the MC method (executed at each trial). So, I would say that we can easily implement points #1 and #2: let's show % instead of jsds and let's add a final MC computation of "true" probabilities. MaX. P.S. I have some python code (ugly, but working) that: - simulates a Nt trials rollout for an arbitrary number of plays (of which you precise the equity) - runs the computations in all 3 flavors; the lower/upper bound based on jsds/conf intervals, the montecarlo stuff (at the end only or during the rollout trials, 2 flavor), the Bayesian stuff _______________________________________________ Bug-gnubg mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-gnubg
