Re: [Computer-go] alarming UCT behavior

Petr Baudis Fri, 06 Nov 2015 13:32:53 -0800

On Fri, Nov 06, 2015 at 10:48:49AM -0800, Dave Dyer wrote:
> 
> Developing a UCT robot for a new game, I have encountered a
> surprising and alarming behavior:  the longer think time the
> robot is given, the worse the results.  That is, the same robot
> given 5 seconds per move defeats one give 30 seconds, or 180 seconds.
> 
> I'm still investigating, but the proximate cause seems to be
> my limit on the size of the UCT tree.   As a memory conservation
> measure, I have a hard limit on the size of the stored tree. After
> the limit is reached, the robot continues running simulations, refining
> the outcomes based on the existing tree and random playouts below
> the leaf nodes.
> 
> My intuition would be that the search would be less effective in this
> mode, but producing worse results (as measured by self-play) is 
> strongly counter intuitive.
> 
> Does it apply to Go?  Maybe not, but it's at least an indicator
> that arbitrary decisions that "ought to" be ok can be very bad in
> practice.


I have seen this exact behavior when first experimenting with long
thinking times in Pachi. When you stop growing the tree, the algorithm
degenerates to "delayed" single-level Monte Carlo along the principal
variations, with all the MC-without-tree weaknesses.

To a degree, you should be able to do away with this by pruning the
tree; but I guess if you do it too many times, you'll start seeing
similar effect (but I don't know how quickly it'd onset).

Without pruning implemented, it seems always better to just stop
thinking early than continuing the search when memory fills up.

(Another tuning mechanism is the node expansion threshold - how many
leaf visits you require before expanding the leaf.  In Go, I believe
this varies pretty widely, from 1 to ~16, and bumping this may be
a perfectly worthwhile tradeoff to make.  I think e.g. Many Faces and
stv have this pretty high, though my memory may fail me.)


P.S.: A corollary to this is that a lot of the important stuff that's
going on and influences strength happens near the leaves of the tree and
nearby in nodes with just a couple of simulations - not near the root of
the tree, which is the visible stuff that the programmer typically sees.
This makes life harder, of course; I suspect heuristic tuning would be
a lot easier with proper tools to analyze "aggregate" behavior near the
leaves.  (But this is just my personal hypothesis.)

-- 
                                Petr Baudis
        If you have good ideas, good data and fast computers,
        you can do almost anything. -- Geoffrey Hinton
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] alarming UCT behavior

Reply via email to