This is expected. Search for the horizon effect related to computer go
(e.g. in Petr Baudis thesis

On Fri, Nov 6, 2015 at 8:48 PM, Gonçalo Mendes Ferreira <>

> That's certainly possible, but assuming the tree is being limited on
> number of nodes, not constant depth, and the selection of equal quality
> candidates to explore is random, how likely is it to happen? The problem
> seems to be constant.
> On 06/11/2015 19:29, Marc Landgraf wrote:
>> It is indeed very realistic and can be recreated in Go.
>> The issue is, that you are chopping of the tree at a fixed point and this
>> may heavily bias the the entire tree, if this point influences the
>> playout.
>> Like imagine there is one big Atari in the entire game, but it can be
>> easily answered. If you have a single path in your tree that ends with
>> said
>> Atari, while no other path ends with this Atari played (or the Atari is
>> already answered in the tree), this path will then dominate all other
>> pathes, because in this path the likelyhood of winning (by taking the
>> stones in Atari) is much bigger than in all other branches. But if you
>> would chop off the last move that branch suddenly the evaluation would be
>> correct. If you would not do any more playouts on this node, it would be
>> just one biased playout, not a big deal in the grand scale. If you would
>> allow the MCTS to continue further, it would also be able to refute the
>> Atari. But you are stopping right at the atari, and then pile on playouts
>> that make it seem work.
>> 2015-11-06 19:59 GMT+01:00 Gonçalo Mendes Ferreira <>:
>> That doesn't seem very realistic. I'd guess your prior values are accurate
>>> but the simulations are biased or not representative. Or you miss
>>> precision
>>> in your transition quality floating points. Or there's a bug related to
>>> being an adversarial problem and you didn't have the robots swap colors?
>>> :)
>>> Do tell what it was when you discover the problem.
>>> Gonçalo F.
>>> On 06/11/2015 18:48, Dave Dyer wrote:
>>> Developing a UCT robot for a new game, I have encountered a
>>>> surprising and alarming behavior:  the longer think time the
>>>> robot is given, the worse the results.  That is, the same robot
>>>> given 5 seconds per move defeats one give 30 seconds, or 180 seconds.
>>>> I'm still investigating, but the proximate cause seems to be
>>>> my limit on the size of the UCT tree.   As a memory conservation
>>>> measure, I have a hard limit on the size of the stored tree. After
>>>> the limit is reached, the robot continues running simulations, refining
>>>> the outcomes based on the existing tree and random playouts below
>>>> the leaf nodes.
>>>> My intuition would be that the search would be less effective in this
>>>> mode, but producing worse results (as measured by self-play) is
>>>> strongly counter intuitive.
>>>> Does it apply to Go?  Maybe not, but it's at least an indicator
>>>> that arbitrary decisions that "ought to" be ok can be very bad in
>>>> practice.
> _______________________________________________
> Computer-go mailing list

Computer-go mailing list

Reply via email to