Magnus Persson: <[EMAIL PROTECTED]>:
>Quoting Hideki Kato <[EMAIL PROTECTED]>:
>
>> Yes, UCT does.  From my recent experiments with a delay
>> line (a fixed size FIFO queue) between a UCTsearcher and an MC
>> simulator with RAVE against GNU Go 3.7.11 level 0 on 9x9 (single
>> thread):
>>
>> delay        #po     wins    games   winning rate    ELO     1 sigma of wr
>> 0    1,000   721     2,000   36.05%          -99.6   1.07%
>> 1    1,000   721     2,000   36.05%          -99.6   1.07%
>> 2    1,000   690     2,000   34.50%          -111.4  1.06%
>> 3    1,000   663     2,000   33.15%          -121.8  1.05%
>> 5    1,000   642     2,000   32.10%          -130.1  1.04%
>> 10   1,000   522     2,000   26.10%          -180.8  0.98%
>> 20   1,000   412     2,000   20.60%          -234.4  0.90%
>> 50   1,000   82      2,000   4.10%           -547.6  0.44%
>
>If I understand this correctly this simulation for delay 50 computes  
>50 playouts and then updates the tree.

Not exactly, if I understand correctly.  The UCT searcher and the MC 
simulator is connected by a delay line of fixed size.  So, except 
initial time, the tree is updated on every playout.  The playout is 
sperated in two parts here, the search (descend) of the tree and 
the simulation, and there is a "delay" between them.  That is, at 
time T, the tree is updated by the result of the simulation of the 
position at (T - 50) if the delay is 50.

>In Valkyria I do the following. Every simulation from the root with  
>their own thread updates all nodes as visited down the tree before  
>entering the heavy playout. This means that all moves made in the tree  
>are temporarily updated as losses. When a playout has finished, half  
>of the moves were winners and updated accordingly.
>
>The idea behind this is that this hopefully avoids searching the same  
>path over and over again. Have tried anything like this?

I use "modifying fpu" method (fpu was introduced in the first report 
of MoGo).  It's very simple.  Before simulating an optimal arc at a 
leaf node, the value (fpu) of the arc is decreased a little (value *= 
0.99).  This avoids simulating the same node if all arcs have 
the same value or no value (ie. at beginning of UCB1 algorithm) but 
does not avoid repeated simulations of significantly better arcs.

>Also your results shows clearly that there is inefficency. But do you  
>also have results where for example delay 50 also computes 50x1000  
>simulations so that we can see if what it means in practise?

As my implementation expands one node on each simulation like MoGo, 
this happens in practice.  NB: this experiment is for a network 
environment where MC simulation servers return results to a UCT 
searcher client with very long delay.

Hideki
--
[EMAIL PROTECTED] (Kato)
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to