Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)

Olivier Teytaud Wed, 07 Feb 2007 01:03:03 -0800

As I have spent a lot of time trying to guess what could be done forQuasi-Monte-Carloor other standard forms of Monte-Carlo-improvements in computer-go, Iwrite below

my (humble and pessimistic :-) ) opinion about that.

Let's formalize Monte-Carlo.
Consider P a distribution of probability. Consider that you

want to estimate Pf, the expectation of f under distribution P. You canget valuesof f at some points; note hP ($\hat P$ is a usual notation) theempirical distribution

of probability, i.e.


     Pf = unknown expectation of f for distribution P

hPf = (\sum_{i=1}^n f(x_i) )/nIn standard naive Monte-Carlo, the x_i are independent identicallydistributed accordingto distribution P. One can sample without replacement in the discretecase (no more indep.

then, but almost :-) ).

Biasing Monte-Carlo (importance sampling) consists in replacing P bysome P',

which samples more often points in more important areas. Standard tools
consist in using P' with density proportional to the product
  (i) density of P

(ii) local standard deviation of f (that is unknown and must beapproximated)

This does not introduce a bias (on average, hPf remains close to Pf), as we
change the weighting of points; hPf is now:

     hPf = (\sum_{i=1}^n P(x_i)f(x_i)/P'(x_i)  )/n

and the x_i are i.i.d according to P' in the most simple case (but otherbetter cases

are possible, e.g. sampling in strata).

Let's consider how this can be applied to Monte-Carlo-Go.

Designing a good random player is not importance sampling as above. There's
no reweighting. We change what is estimated; we do not want to know what

is the probability of winning for a uniform random player; we just wanta betterplayer, so that probabilities estimated by this player will lead to abetter MC-basedplayer. Improving the random player is very important, but this is notimportance sampling.


Why not adding importance sampling ?
Because this requires a good estimate of important

areas, in the sense e.g. "areas with high standard deviation". This is,in my humble opinion,nearly as hard as designing a good Go-evaluation-function. Of course, Imight miss something;but I think that if you can quickly find areas with significantly higherof lower probabilitiesof winning, then you have improved your random player, and you don'twant to reweight anything,and you are not interested in the probability of winning for the basicrandom player - you want

to move to the new random player.

This is only improving the random player; you change the distribution,because you guess that it willbe a better random player (and you hope that your MC-based player willbe better).

Quasi-Monte-Carlo methods consist in replacing hP by a sum on adeterministic

or moderately randomized sample so that the convergence

from hPf to Pf is faster (than the O(1/\sqrt{n}) usual in MC-methods).This is somewhatorthogonal to importance sampling (but both techniques are difficult tomerge in one technique,

by the way).

In standard QMC, the distribution is not modified, but points are notchosenby independent identically distributed sampling (by the way, this isalso not the casewith most importance-samplings - but in QMC the focus is on reducing therandom-part).

(It is now often considered that QMC must be only partiallyderandomized, as"randomized QMC" (as opposed to strictly deterministic QMC) is better,especially forlarge dimension; but, nevermind, let's consider a QMC method,independently ofhow it is designed; we can just pick up a QMC-sequence from the state ofthe art,

without (at first) trying to understand how it works.)

But, QMC is very efficient in continuous domains, sometimes in specificdiscretedomains also, but I don't see how to plug QMC in tree-search, or evenjust in a Go-ban samplingfor introducing QMC in a particular node. It looks appealing, but youneed something like a "distance"between moves that is related to the probability of winning, and even ifyou have such a distance, designing

the QMC-sequence is far from straightforward.

However, a simple form of derandomization, the so-called"jittered-sampling", can perhaps be applied.


  instead of randomly sampling a go-ban,

split (partition) the go-ban in k zones, and pick the i^{th} point inthe 1+(i%k) zone;

  (% means modulo) zone.

(this in the case of a partition in areas with same area, but thiscan be adapted to

  unequal areas)

Perhaps it is a good idea, but I have not enough Go-knowledge fordesigning such zones.Another possible tool for improving Monte-Carlo comparisons

is introducing correlations between explorations of various nodes.
This possibility, consisting in correlating choices in nodes B and C.

should improve the stability of the comparison between B and C,therefore improving

the choice performed in node A if A has sons B and C.

However, this introduce a strong variance-increase at node A if A is thefather of B and C (much

less exploration as all sons have correlated sons!).

Well, for me the most appealing form of MC-improvements is somethinglike simple "jittered"-samplingsas above. But I have not enough Go-knowledge to experiment it, and I amabsolutely not sure that it will work :-)



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte Carlo (MC) vs Quasi-Monte Carlo (QMC)

Reply via email to