As I have spent a lot of time trying to guess what could be done for
Quasi-Monte-Carlo
or other standard forms of Monte-Carlo-improvements in computer-go, I
write below
my (humble and pessimistic :-) ) opinion about that.
Let's formalize Monte-Carlo.
Consider P a distribution of probability. Consider that you
want to estimate Pf, the expectation of f under distribution P. You can
get values
of f at some points; note hP ($\hat P$ is a usual notation) the
empirical distribution
of probability, i.e.
Pf = unknown expectation of f for distribution P
hPf = (\sum_{i=1}^n f(x_i) )/n
In standard naive Monte-Carlo, the x_i are independent identically
distributed according
to distribution P. One can sample without replacement in the discrete
case (no more indep.
then, but almost :-) ).
Biasing Monte-Carlo (importance sampling) consists in replacing P by
some P',
which samples more often points in more important areas. Standard tools
consist in using P' with density proportional to the product
(i) density of P
(ii) local standard deviation of f (that is unknown and must be
approximated)
This does not introduce a bias (on average, hPf remains close to Pf), as we
change the weighting of points; hPf is now:
hPf = (\sum_{i=1}^n P(x_i)f(x_i)/P'(x_i) )/n
and the x_i are i.i.d according to P' in the most simple case (but other
better cases
are possible, e.g. sampling in strata).
Let's consider how this can be applied to Monte-Carlo-Go.
Designing a good random player is not importance sampling as above. There's
no reweighting. We change what is estimated; we do not want to know what
is the probability of winning for a uniform random player; we just want
a better
player, so that probabilities estimated by this player will lead to a
better MC-based
player. Improving the random player is very important, but this is not
importance sampling.
Why not adding importance sampling ?
Because this requires a good estimate of important
areas, in the sense e.g. "areas with high standard deviation". This is,
in my humble opinion,
nearly as hard as designing a good Go-evaluation-function. Of course, I
might miss something;
but I think that if you can quickly find areas with significantly higher
of lower probabilities
of winning, then you have improved your random player, and you don't
want to reweight anything,
and you are not interested in the probability of winning for the basic
random player - you want
to move to the new random player.
This is only improving the random player; you change the distribution,
because you guess that it will
be a better random player (and you hope that your MC-based player will
be better).
Quasi-Monte-Carlo methods consist in replacing hP by a sum on a
deterministic
or moderately randomized sample so that the convergence
from hPf to Pf is faster (than the O(1/\sqrt{n}) usual in MC-methods).
This is somewhat
orthogonal to importance sampling (but both techniques are difficult to
merge in one technique,
by the way).
In standard QMC, the distribution is not modified, but points are not
chosen
by independent identically distributed sampling (by the way, this is
also not the case
with most importance-samplings - but in QMC the focus is on reducing the
random-part).
(It is now often considered that QMC must be only partially
derandomized, as
"randomized QMC" (as opposed to strictly deterministic QMC) is better,
especially for
large dimension; but, nevermind, let's consider a QMC method,
independently of
how it is designed; we can just pick up a QMC-sequence from the state of
the art,
without (at first) trying to understand how it works.)
But, QMC is very efficient in continuous domains, sometimes in specific
discrete
domains also, but I don't see how to plug QMC in tree-search, or even
just in a Go-ban sampling
for introducing QMC in a particular node. It looks appealing, but you
need something like a "distance"
between moves that is related to the probability of winning, and even if
you have such a distance, designing
the QMC-sequence is far from straightforward.
However, a simple form of derandomization, the so-called
"jittered-sampling", can perhaps be applied.
instead of randomly sampling a go-ban,
split (partition) the go-ban in k zones, and pick the i^{th} point in
the 1+(i%k) zone;
(% means modulo) zone.
(this in the case of a partition in areas with same area, but this
can be adapted to
unequal areas)
Perhaps it is a good idea, but I have not enough Go-knowledge for
designing such zones.
Another possible tool for improving Monte-Carlo comparisons
is introducing correlations between explorations of various nodes.
This possibility, consisting in correlating choices in nodes B and C.
should improve the stability of the comparison between B and C,
therefore improving
the choice performed in node A if A has sons B and C.
However, this introduce a strong variance-increase at node A if A is the
father of B and C (much
less exploration as all sons have correlated sons!).
Well, for me the most appealing form of MC-improvements is something
like simple "jittered"-samplings
as above. But I have not enough Go-knowledge to experiment it, and I am
absolutely not sure that it will work :-)
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/