Re: [Computer-go] Monte Carlo (upper confidence bounds applied to trees)

Petr Baudis Tue, 26 Oct 2010 05:03:00 -0700

  Hi!

On Tue, Oct 26, 2010 at 02:43:26PM +0400, Петр Смолов wrote:
> I would much appreciate if someone could explain me what exactly "upper 
> confidence bounds applied to trees" is.


  Very roughly speaking, given fixed confidence (probability it falls
within the interval), for each node we compute the upper bound of its
winning probability given M tests in N trials. Perhaps someone can
explain this more accurately and still in simple words.

> So the program will explore b2 and c2 (starting from b1), but maybe a2 is the 
> only refuting move. Maybe a1 is better, but the program will continue to 
> explore b1, because utc of a1 is 0 and utc of b1 is more than 0.

  It will not continue indefinitely, since as number of trials (in root,
then in b1) goes up, the number of tests (for a1 and a2) stays fixed and
therefore the upper confidence bound of a1 and a2 is increasing -
basically, with ratio M/N going down, we accept that given move might in
the best case have higher and higher winning probability.

  On the other hand, M/N for b1 and b2,c2 will stay very high (near to
1) as we keep exploring these. And upper bound of b1 will thus stay very
near the actual winning rate measured.

  Therefore, after certain number of trials, the upper bound of a1 will
inflate so much that it will overcome the upper bound of b1 and a1 will
be picked again and simulated once. The result then determines whether
the winrate moves up and down, and therefore if less or more further
simulations will be required for upper bound of a1 to overcome b1 again.

  The number of trials required to re-explore moves tossed away can be
controlled by the c-term of UCB1 formula. It basically controls the
ratio of exploration and exploitation in our bandit situations. Usually,
the more accurate your simulations become, the more you want to reduce
exploration and increase exploitation.

-- 
                                Petr "Pasky" Baudis
The true meaning of life is to plant a tree under whose shade
you will never sit.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Monte Carlo (upper confidence bounds applied to trees)

Reply via email to