I post this question in the form of a problem whose
answer I don't know:

*Problem* : At the given position B loses by 5 pts. He
has only 1 winning move, which is winning a ladder. For
simplification he always has 25 legal moves. The ladder
is 20 ply deep. And, as it is the case with ladders, the
deeper you play it, the more you loose if you loose. The
probability of finding the ladder in the first n simulations
(say n=1000) is (approx.) 1000/25^10 which we
consider = 0.

B's decision tree is (only B decisions included):

move 1     3          5               19
------
[1 -> V]   [1 -> V']  [1 -> V'']  ... [1 -> V*]
(24 -> -5) (24 -> -9) (24 -> -13)     (24 -> -36)

Notation explained:

[· -> ·] node in the winning thread
(· -> ·) node in other threads

(n -> ·) number of branches
(· -> v) value of the node (the deeper, the worse)

Finding the correct 20 ply (10 moves for B) thread leads
to the correct V = V' = ... = V* > 0. But, ignoring that
thread, leads to V = -9, V' = -13, V'' = -17 ...

Our program starts considering V = -9 because it did not
find V* > 0.


*Question* : 1.How does MC/UCT find the correct answer
if the more you approach it (except if you hit it by a fluke),
the worse it evaluates?

2. Is UCT biasing the distribution of moves to be searched
*against* the only correct answer i.o. towards it?

*Note* : This is by no means an anti-MC argument. If I
was anti-MC, I would not loose any time with this. I am
trying to understand how it works.

Jacques.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to