Re: [computer-go] Optimal explore rates for plain UCT

Don Dailey Mon, 10 Mar 2008 15:50:41 -0700


Petr Baudis wrote:
>   Hi,
>
> On Sat, Mar 08, 2008 at 10:18:34AM +0100, Petr Baudis wrote:
>   
>>   (By the way, pachi1-*-light are UCT bots with completely light
>> playouts with various UCB1 c values, if anyone wants to use that as
>> reference. Surprisingly, it seems that my heavy playouts do not make big
>> difference so far, though the rating is still very unstable.)
>>     
>
>   after two days of play, it seems the ratings are fairly settled now.
> For clarity, here is the UCB1 formula I use:
>
>       UCB1 = X_i + sqrt(log(N) * c / n)
>
> Specifically, the c is withing the sqrt(); some of the papers put it in
> front of the sqrt.
>
>   Also, I expand UCT leaves at the second hit. This retains conservative
> memory usage but it is important for strength - I saw huge strength
> increase when I lowered this to 2 from the original value of 5.
>
>   With 110k playouts per move and no domain knowledge in the playouts,
> the ratings are now:
>
>       c=0.2  (pachi1-p0.2-light)      ELO 1627 (285 games)
>       c=1.0  (pachi1-p1.0-light)      ELO 1590 (120 games)
>       c=0.05 (pachi1-p0.05-light)     ELO 1531 (286 games)
>       c=2.0  (pachi1-p2.0-light)      ELO 1511 (118 games)
>
>   The main two messages of this post are: If you are developing own UCT
> bot, with this number of playouts you should be aiming at least at 1600
> ELO on CGOS. And choosing the right c can easily make a 100 ELO
> difference! In particular, the "default" UCB1 c=2.0 appears to be very
> unsuitable choice.
>   
I think you may still have a bug.  You should get well over 1700 with
110,000 playouts, even if they are light playouts.



>   I'm pretty sure my code is fairly well debugged now, but of course
> there may be still bugs lurking; when I have put my bots on CGOS for the
> first time it was awfully bug-ridden (and about 800 ELO worse ;-). What
> ELO rating did pure UCT bots get historically with how many playouts?
>   
FatMan does 20k playouts and has heavy play-outs, very similar to the
first paper where mogo described it's play-out strategy - basically
playing a random out of atari move or a local move that fits one of
their patterns.   It is rated 1800 on CGOS.    The tree expansion policy
for nodes is based on the parent count,   not the child itself.    So
once the parent has 100 play-outs children are expanded regardless of
the number of games they have seen.    (Near the end of the game in 9x9
go some moves could be tried a few times before being expanded.) 

None of the other things are done in FatMan that many of the modern
programs are doing.    I know that an older versions of Lazarus was
playing over 1700 with light play-outs and a formula I made up (which
doesn't works as well as the ucb stuff.)    It was doing about 100k
playouts at the most. 

I'll bet you have some crazy bug.  

- Don




>
> P.S.: Looks like the heavy playouts I described in my other mail bring
> no improvement to the bot strength at all, and mostly make it few ELO
> weaker. :-( I'm rethinking my approaches now.
>
>   
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optimal explore rates for plain UCT

Reply via email to