Re: [computer-go] Optimal explore rates for plain UCT

Don Dailey Thu, 13 Mar 2008 06:20:28 -0700

> If it is agreed,  I will start a 25k test.    My prediction is that this
> will finish around 1600 ELO on CGOS.   
>   
Looks like it's currently around 1485,  so I am 115 ELO off from my
prediction at the moment.


2 more doublings would probably get you to 1700 -  perhaps I will test
this later,  I think I can handle 100,000 nodes without a problem timing
out.      If it's too much I think I can solve it by playing quickly
when the score is extreme - and it will have little impact on the strength.

Note that in the scalability study each doubling for FatMan was worth
over 100 ELO on 9x9  and FatMan did not improve as quickly as Mogo. 

If you look at the table,  drdGeneric 10k is rated 1228 and 25k is
rating 1485 which is 257 ELO for doing 2.5 x more play-outs.      If
this holds, I would expect 100,000 play-outs to give well over 1700
ELO.     Of course there could be a lot of error in each of these
ratings so it's difficult to predict just from these 2 points.

- Don


> I can also run a special all-time rating list if you participate which I
> believe returns more accurate ratings - you need at least 200 games to
> get on the list and I believe 500 is far better to get an accurate
> rating.    I'm going to try to get at least 500.
>
> - Don
>
>
> P.S.
>
> I discovered that you can set the level substantially higher (and get
> away with it) if you have code to cut down the number of playouts a lot
> in endings  which are nearly won or lost.     Even better is to
> progressively cut down the play-outs as a function of distance from the
> opening - but then we would all have to standardize on this too and it
> would probably be difficult to agree on how we should do that.      But
> the issue is what rating can you expect with a relatively generic
> UCT-like MC implementation given some number of play-outs.
>
>
>
>
>   
>>>   
>>>     
>>>       
>> I'm working on a new program, starting from scratch.   When it gets far
>> enough along I will compare my numbers (and ELO) to yours.
>>
>>
>>   
>>     
>>>   
>>>     
>>>       
>>>>>   I'm pretty sure my code is fairly well debugged now, but of course
>>>>> there may be still bugs lurking; when I have put my bots on CGOS for the
>>>>> first time it was awfully bug-ridden (and about 800 ELO worse ;-). What
>>>>> ELO rating did pure UCT bots get historically with how many playouts?
>>>>>   
>>>>>       
>>>>>         
>>>>>           
>>>> FatMan does 20k playouts and has heavy play-outs, very similar to the
>>>> first paper where mogo described it's play-out strategy - basically
>>>> playing a random out of atari move or a local move that fits one of
>>>> their patterns.   It is rated 1800 on CGOS.    The tree expansion policy
>>>> for nodes is based on the parent count,   not the child itself.    So
>>>> once the parent has 100 play-outs children are expanded regardless of
>>>> the number of games they have seen.    (Near the end of the game in 9x9
>>>> go some moves could be tried a few times before being expanded.) 
>>>>     
>>>>       
>>>>         
>>> Oh, interesting! I must have misread libEGO code which seems to use
>>> similar thresholds.
>>>
>>> What is the justification of using the parent playout count instead of
>>> the node playout count itself?
>>>
>>>   
>>>     
>>>       
>> I don't know if it makes much difference how this is done, and I don't
>> know how everybody else is doing it.      I allocate all the children at
>> once and do not have to store pointers to each child,  just one pointer
>> to the first child and a counter so that I know how many children there
>> are.   On average I'm probably expanding every other time in the middle
>> of the game. 
>>
>> I preallocate all the memory for the nodes when the program starts
>> instead of using malloc and free, and I assume most others are doing
>> something similar.    
>>
>> Here is my basic data structure:
>>
>> typedef struct ntag
>> {
>>   int            wins;
>>   int            games;
>>   int            child_count;  // zero until parent wins > 100
>>   struct  ntag   *children;    // a pointer to a place in the big "node
>> pool"
>>   mv_t           mv;           // move that got us here.
>>
>> } node_t;
>>
>> When the child nodes are allocated, they are done all at once with
>> this code - where cc is the number of fully legal child nodes:
>>
>>    // reserve space for cc entries
>>     n->child = &(pool[pool_count]);
>>     pool_count += cc;
>>
>> overflow checking is done outside of the search routine.
>>
>> There are almost certainly better schemes,   I just used what occurred
>> to me to be the easiest and quickest to implement without working too
>> hard at it.
>>
>> Some programs hash each position and the tree is more abstract, no
>> pointers  just positions leading to other positions by zobrist hash keys
>> in a hash table.  
>>
>> My scheme probably wastes a lot of space on nodes that are left
>> unvisited at the leaves of the tree.   But I don't waste much on storing
>> pointers since I keep them in an array. 
>>
>> What is the state of the art on this?   How is everyone else doing it?
>>
>>
>> - Don
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> computer-go mailing list
>> computer-go@computer-go.org
>> http://www.computer-go.org/mailman/listinfo/computer-go/
>>     
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
>   
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optimal explore rates for plain UCT

Reply via email to