[computer-go] Re: Goal-directedness of Monte-Carlo

Ingo Althöfer Thu, 11 Sep 2008 04:44:33 -0700

Olivier Teytaud wrote:
> ... I find the following elements interesting:
> 1) MoGo was seemingly weaker in handicap games...


Was this (feeled) weakness on both sides (giver and taker) 
of handicap tables?

> 2) MoGo has become stronger (and the difference is huge for long time
> settings) with more randomization; ... This strong improvement with 
> randomization 
> might be also an improvement in the case of handicap games, but I've not 
> checked that.

Of course, more randomization in sense of "indenpendent runs" and also in sense 
of "different heuristics" should help. But I do not see, how it might overcome 
the laziness problem of MC.

> 3) By the way, we have positive results (small improvements, but
> improvements) by using different Monte-Carlo simulations for different
> cores. This is consistent with 2) above - by performing N simulations of
> type 1 and M simulations of type 2, you cover more
> markov models than with one Markov model - you can cover different
> scenarios. 

It also fits well with the philosophy of "the wisdom of the crowds", where it 
is stated that an important source of this wisdom is that different people have 
different knowledge/heuristics.

*****************************************
Here comes a proposal to check for the laziness of Monte Carlo go programs:
Take some (a dozen?) typical board positions.
For each such position P do the following:
        For "each" komi value k make 100 runs of your program on (P; k), say 
with
            fixed thinking time of 10 seconds.
        Monte Carlo programs are random, and thus also have random outcomes.
        Make statistics of the 100 moves proposed by the program, and determine
        the entropy H(P; k) = minus sum of (h_i * log(h_i) ), where the h_i are 
the 
        frequencies of the different move proposals.

Small entropy would mean that the algorithm is rather sure about the best move. 
Large entropy means that there are several moves with similar merits (in the 
opinion of the algorithm).
When you find a central value k' = k'(P), such that
... > H(P; k'-2) > H(P; k'-1) > H(P; k') < H(P; k'+1) < H(P; k'+2) < ...
this would be an indication that k' is near to the likely value of the position.

Ingo.
-- 
Psssst! Schon das coole Video vom GMX MultiMessenger gesehen?
Der Eine für Alle: http://www.gmx.net/de/go/messenger03
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Re: Goal-directedness of Monte-Carlo

Reply via email to