As I have spent a lot of time trying to guess what could be done for Quasi-Monte-Carlo or other standard forms of Monte-Carlo-improvements in computer-go, I write below
my (humble and pessimistic :-) ) opinion about that.

Let's formalize Monte-Carlo.
Consider P a distribution of probability. Consider that you
want to estimate Pf, the expectation of f under distribution P. You can get values of f at some points; note hP ($\hat P$ is a usual notation) the empirical distribution
of probability, i.e.

     Pf = unknown expectation of f for distribution P
hPf = (\sum_{i=1}^n f(x_i) )/n In standard naive Monte-Carlo, the x_i are independent identically distributed according to distribution P. One can sample without replacement in the discrete case (no more indep.
then, but almost :-) ).

Biasing Monte-Carlo (importance sampling) consists in replacing P by some P',
which samples more often points in more important areas. Standard tools
consist in using P' with density proportional to the product
  (i) density of P
(ii) local standard deviation of f (that is unknown and must be approximated)
This does not introduce a bias (on average, hPf remains close to Pf), as we
change the weighting of points; hPf is now:

     hPf = (\sum_{i=1}^n P(x_i)f(x_i)/P'(x_i)  )/n
and the x_i are i.i.d according to P' in the most simple case (but other better cases
are possible, e.g. sampling in strata).

Let's consider how this can be applied to Monte-Carlo-Go.

Designing a good random player is not importance sampling as above. There's
no reweighting. We change what is estimated; we do not want to know what
is the probability of winning for a uniform random player; we just want a better player, so that probabilities estimated by this player will lead to a better MC-based player. Improving the random player is very important, but this is not importance sampling.

Why not adding importance sampling ?
Because this requires a good estimate of important
areas, in the sense e.g. "areas with high standard deviation". This is, in my humble opinion, nearly as hard as designing a good Go-evaluation-function. Of course, I might miss something; but I think that if you can quickly find areas with significantly higher of lower probabilities of winning, then you have improved your random player, and you don't want to reweight anything, and you are not interested in the probability of winning for the basic random player - you want
to move to the new random player.
This is only improving the random player; you change the distribution, because you guess that it will be a better random player (and you hope that your MC-based player will be better).

Quasi-Monte-Carlo methods consist in replacing hP by a sum on a deterministic
or moderately randomized sample so that the convergence
from hPf to Pf is faster (than the O(1/\sqrt{n}) usual in MC-methods). This is somewhat orthogonal to importance sampling (but both techniques are difficult to merge in one technique,
by the way).
In standard QMC, the distribution is not modified, but points are not chosen by independent identically distributed sampling (by the way, this is also not the case with most importance-samplings - but in QMC the focus is on reducing the random-part).

(It is now often considered that QMC must be only partially derandomized, as "randomized QMC" (as opposed to strictly deterministic QMC) is better, especially for large dimension; but, nevermind, let's consider a QMC method, independently of how it is designed; we can just pick up a QMC-sequence from the state of the art,
without (at first) trying to understand how it works.)

But, QMC is very efficient in continuous domains, sometimes in specific discrete domains also, but I don't see how to plug QMC in tree-search, or even just in a Go-ban sampling for introducing QMC in a particular node. It looks appealing, but you need something like a "distance" between moves that is related to the probability of winning, and even if you have such a distance, designing
the QMC-sequence is far from straightforward.

However, a simple form of derandomization, the so-called "jittered-sampling", can perhaps be applied.

  instead of randomly sampling a go-ban,

split (partition) the go-ban in k zones, and pick the i^{th} point in the 1+(i%k) zone;
  (% means modulo) zone.
(this in the case of a partition in areas with same area, but this can be adapted to
  unequal areas)

Perhaps it is a good idea, but I have not enough Go-knowledge for designing such zones. Another possible tool for improving Monte-Carlo comparisons
is introducing correlations between explorations of various nodes.
This possibility, consisting in correlating choices in nodes B and C.
should improve the stability of the comparison between B and C, therefore improving
the choice performed in node A if A has sons B and C.
However, this introduce a strong variance-increase at node A if A is the father of B and C (much
less exploration as all sons have correlated sons!).

Well, for me the most appealing form of MC-improvements is something like simple "jittered"-samplings as above. But I have not enough Go-knowledge to experiment it, and I am absolutely not sure that it will work :-)


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to