I post this question in the form of a problem whose answer I don't know:
*Problem* : At the given position B loses by 5 pts. He has only 1 winning move, which is winning a ladder. For simplification he always has 25 legal moves. The ladder is 20 ply deep. And, as it is the case with ladders, the deeper you play it, the more you loose if you loose. The probability of finding the ladder in the first n simulations (say n=1000) is (approx.) 1000/25^10 which we consider = 0. B's decision tree is (only B decisions included): move 1 3 5 19 ------ [1 -> V] [1 -> V'] [1 -> V''] ... [1 -> V*] (24 -> -5) (24 -> -9) (24 -> -13) (24 -> -36) Notation explained: [· -> ·] node in the winning thread (· -> ·) node in other threads (n -> ·) number of branches (· -> v) value of the node (the deeper, the worse) Finding the correct 20 ply (10 moves for B) thread leads to the correct V = V' = ... = V* > 0. But, ignoring that thread, leads to V = -9, V' = -13, V'' = -17 ... Our program starts considering V = -9 because it did not find V* > 0. *Question* : 1.How does MC/UCT find the correct answer if the more you approach it (except if you hit it by a fluke), the worse it evaluates? 2. Is UCT biasing the distribution of moves to be searched *against* the only correct answer i.o. towards it? *Note* : This is by no means an anti-MC argument. If I was anti-MC, I would not loose any time with this. I am trying to understand how it works. Jacques. _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/