[Bug-gnubg] Rollout jsd, statsig etc. [LONG]

Massimiliano Maini Mon, 16 Nov 2009 07:23:19 -0800

Hi all,

there has been a looong thread on bgonline (seems to be down right
now) about the
right manner to present rollout results.


Gnubg shows the so called "JSDs": the idea behind is that for a JSD of
1.96 between
two plays there's 95% chance for the play with higher rollout equity
being really the
one with higher equity (there some inaccuracy in the above sentence,
but the concept
is there).

Remark #1:
Why don't we show the % instead of the JSD ? It's much more reasonable.
This will mean that at the end of a rollout, we'll have something like

Play 1    +0.650       95.5%
Play 2    +0.630         4.5%
Play 3    +0.600         2.5%

Notice that the percentage shown aside the top play is the
"confidence"we have in it
being better than the 2nd best play. For the other plays, the
percentage shown is the
confidence we have in that play being better than the top play.
Percentages for play 1
and 2 must add to 100%.

Showing the result this was looks nicer and more easily understandable.
Also, we should allow to enter percentages intead of JSDs as stopping criteria.


Remark #2:
Confidence intervals of play1 vs playN does nt give you the right picture: most
of the time what you want to know is the chances of any play being
better than *all* the
others.

One easy approach is this: for each play you have an equity and a
stdev. You assume
the true equity of each play to have a normal distribution and you do
Nmc MonteCarlo
trials. For each trial, you sample the N normal distributions and you
find the highest value.
You add 1 to the total of the play with the highest value.

At the end, you divide each  play total by Nmc and this gives you the
chances of each
play being best.

The above is just MonteCarlo integration of multiple normal distributions.

Only things you need are erf (error function, part of glib) and its
inverse (not part of glib
but part, for example, of Gnu Scientific library, or you can find many
decent C/C++
implementations around).

50K montecarlo trials are enough, depending on how fast it is, you can do more.

It would be very easy to integrate this in gnubg and to compute the
values at the end
of the rollout (and show them aside the confidence interval ones).

More ambitious (and probably unnecessary) would be to do the MC stuff
at the end of
each trial and use its results as stopping criteria for the rollout.
But this may slow
down rollouts (TBC, see remark #4 below).


Remark #3:
Another criteria that could be used to stop the rollout is the product
of all the confidence
of play1 vs playN. In my example above, play1 better than play2 with
confidence 95.5%,
play1 vs play3 confidence (100-2.5) = 97.5%, hence overall confidnce
of play 1 being best
= 95.5% * 97.5% = 93.1%.

This value is in fact a lower bound (it assumes that the events 'p1
better than p2" and "p1
better than p3" are uncorrelated, which is wrong). Combined with the
confidence of play1 vs
play2 (which is an upper bound), they give a reasonable interval of
the true probability at
almost no effort.


Remark #4:
Tim Chow has proposed a bayesian approach which seems by far the right
conceptual way
to answer the question "how sure are we that play1 is best". However,
this methods too
slow downs rollouts (you have to execute steps at each trial) and does
not provide any
major advantage over the montecarlo method (which can be executed only
once at the
end of the rollout, if not used as stopping criteria).
In my *extremely* naive python implementation however, the bayes
method was much faster
than the MC method (executed at each trial).


So, I would say that we can easily implement points #1 and #2: let's
show % instead of jsds
and let's add a final MC computation of "true" probabilities.


MaX.

P.S.
I have some python code (ugly, but working) that:

- simulates a Nt trials rollout for an arbitrary number of plays (of
which you precise
the equity)
- runs the computations in all 3 flavors; the lower/upper bound based
on jsds/conf intervals,
the montecarlo stuff (at the end only or during the rollout trials, 2
flavor), the Bayesian stuff


_______________________________________________
Bug-gnubg mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-gnubg

[Bug-gnubg] Rollout jsd, statsig etc. [LONG]

Reply via email to