Developing a UCT robot for a new game, I have encountered a surprising and alarming behavior: the longer think time the robot is given, the worse the results. That is, the same robot given 5 seconds per move defeats one give 30 seconds, or 180 seconds.
I'm still investigating, but the proximate cause seems to be my limit on the size of the UCT tree. As a memory conservation measure, I have a hard limit on the size of the stored tree. After the limit is reached, the robot continues running simulations, refining the outcomes based on the existing tree and random playouts below the leaf nodes. My intuition would be that the search would be less effective in this mode, but producing worse results (as measured by self-play) is strongly counter intuitive. Does it apply to Go? Maybe not, but it's at least an indicator that arbitrary decisions that "ought to" be ok can be very bad in practice. _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go