Short version:
        1) Rules are just a convenient way to calculate probabilities.
        2) The error rates from using rules are smaller than appears.


Long version:
"Rules-driven" and "probability-driven" are not mutually exclusive.
Rules are merely a simple way to calculate only the part of a probability
distribution (PD) that you need to make a decision.

For example, when Pachi decides, with 90% probability, to play a move in the
topological neighborhood of the last move, then it is simply going down one
branch of a tree. 10% of the time it will go down the other branch.

By going down the 90% branch, Pachi is efficiently calculating the local
decision. It can go down the other branch some other time.

Pachi could calculate the entire distribution and then make a choice, but
that
would be much less efficient than only calculating the part that it needs.

Another example: in a pure "mogo" implementation, you check around the
last move for patterns, and then always move there. In this design, your
probability distribution has a 0% chance of selecting moves that do not
match
patterns. This appears to be a sharp rule, with no probabilistic behavior.

But "mogo-style" actually has lot of randomness. For example, almost 50% of
positions do not match any local patterns. In such positions you will make
a random choice anyway. IIRC, about 20% of positions will match *multiple*
patterns, so you will make a random choice there.

One final point about randomness. A pure rule-driven approach appears to
sample too little of the space. That is, the pattern PD predicts that much
more of the space should be examined, but the rule is cutting that off.
But the pattern PD has a lot more entropy than necessary! This is because a
playout is not trying to find the correct *sequence* of moves. A playout is
trying to predict the final position. This is significantly different!

When trying to predict a final position, if a pattern predicts that a point
will *eventually* be occupied by a Black stone, then you can place that
stone
immediately. So the pattern PD might say that a move is played 55% of the
time.
When a Mogo engine plays there, it looks like it is cutting off 45% of the
search space. But suppose that point is occupied by that color 95% of the
time.
Because of transpositions, the Mogo engine would only cut off 5% of the
search.

And then you have the balancing effects. In order to actually make a wrong
decision, there must be a win-loss difference between the branch taken and
the branch cut off. And those differences can cancel within branches,
and across branches.

Bottom line: the theory underlying MCTS gives broad latitude for optimizing
implementation.
        
Brian


_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to