This is expected. Search for the horizon effect related to computer go (e.g. in Petr Baudis thesis http://pasky.or.cz/go/prace.pdf)
On Fri, Nov 6, 2015 at 8:48 PM, Gonçalo Mendes Ferreira <go...@sapo.pt> wrote: > That's certainly possible, but assuming the tree is being limited on > number of nodes, not constant depth, and the selection of equal quality > candidates to explore is random, how likely is it to happen? The problem > seems to be constant. > > > On 06/11/2015 19:29, Marc Landgraf wrote: > >> It is indeed very realistic and can be recreated in Go. >> >> The issue is, that you are chopping of the tree at a fixed point and this >> may heavily bias the the entire tree, if this point influences the >> playout. >> Like imagine there is one big Atari in the entire game, but it can be >> easily answered. If you have a single path in your tree that ends with >> said >> Atari, while no other path ends with this Atari played (or the Atari is >> already answered in the tree), this path will then dominate all other >> pathes, because in this path the likelyhood of winning (by taking the >> stones in Atari) is much bigger than in all other branches. But if you >> would chop off the last move that branch suddenly the evaluation would be >> correct. If you would not do any more playouts on this node, it would be >> just one biased playout, not a big deal in the grand scale. If you would >> allow the MCTS to continue further, it would also be able to refute the >> Atari. But you are stopping right at the atari, and then pile on playouts >> that make it seem work. >> >> >> 2015-11-06 19:59 GMT+01:00 Gonçalo Mendes Ferreira <go...@sapo.pt>: >> >> That doesn't seem very realistic. I'd guess your prior values are accurate >>> but the simulations are biased or not representative. Or you miss >>> precision >>> in your transition quality floating points. Or there's a bug related to >>> being an adversarial problem and you didn't have the robots swap colors? >>> :) >>> >>> Do tell what it was when you discover the problem. >>> >>> Gonçalo F. >>> >>> >>> On 06/11/2015 18:48, Dave Dyer wrote: >>> >>> Developing a UCT robot for a new game, I have encountered a >>>> surprising and alarming behavior: the longer think time the >>>> robot is given, the worse the results. That is, the same robot >>>> given 5 seconds per move defeats one give 30 seconds, or 180 seconds. >>>> >>>> I'm still investigating, but the proximate cause seems to be >>>> my limit on the size of the UCT tree. As a memory conservation >>>> measure, I have a hard limit on the size of the stored tree. After >>>> the limit is reached, the robot continues running simulations, refining >>>> the outcomes based on the existing tree and random playouts below >>>> the leaf nodes. >>>> >>>> My intuition would be that the search would be less effective in this >>>> mode, but producing worse results (as measured by self-play) is >>>> strongly counter intuitive. >>>> >>>> Does it apply to Go? Maybe not, but it's at least an indicator >>>> that arbitrary decisions that "ought to" be ok can be very bad in >>>> practice. >>>> >>>> >>>> > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > -- Blog: http://bettong.net/ Twitter: https://twitter.com/ujh Homepage: http://www.urbanhafner.com/
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go