That's certainly possible, but assuming the tree is being limited on
number of nodes, not constant depth, and the selection of equal quality
candidates to explore is random, how likely is it to happen? The problem
seems to be constant.
On 06/11/2015 19:29, Marc Landgraf wrote:
It is indeed very realistic and can be recreated in Go.
The issue is, that you are chopping of the tree at a fixed point and this
may heavily bias the the entire tree, if this point influences the playout.
Like imagine there is one big Atari in the entire game, but it can be
easily answered. If you have a single path in your tree that ends with said
Atari, while no other path ends with this Atari played (or the Atari is
already answered in the tree), this path will then dominate all other
pathes, because in this path the likelyhood of winning (by taking the
stones in Atari) is much bigger than in all other branches. But if you
would chop off the last move that branch suddenly the evaluation would be
correct. If you would not do any more playouts on this node, it would be
just one biased playout, not a big deal in the grand scale. If you would
allow the MCTS to continue further, it would also be able to refute the
Atari. But you are stopping right at the atari, and then pile on playouts
that make it seem work.
2015-11-06 19:59 GMT+01:00 Gonçalo Mendes Ferreira <go...@sapo.pt>:
That doesn't seem very realistic. I'd guess your prior values are accurate
but the simulations are biased or not representative. Or you miss precision
in your transition quality floating points. Or there's a bug related to
being an adversarial problem and you didn't have the robots swap colors? :)
Do tell what it was when you discover the problem.
On 06/11/2015 18:48, Dave Dyer wrote:
Developing a UCT robot for a new game, I have encountered a
surprising and alarming behavior: the longer think time the
robot is given, the worse the results. That is, the same robot
given 5 seconds per move defeats one give 30 seconds, or 180 seconds.
I'm still investigating, but the proximate cause seems to be
my limit on the size of the UCT tree. As a memory conservation
measure, I have a hard limit on the size of the stored tree. After
the limit is reached, the robot continues running simulations, refining
the outcomes based on the existing tree and random playouts below
the leaf nodes.
My intuition would be that the search would be less effective in this
mode, but producing worse results (as measured by self-play) is
strongly counter intuitive.
Does it apply to Go? Maybe not, but it's at least an indicator
that arbitrary decisions that "ought to" be ok can be very bad in
Computer-go mailing list