Magnus Persson: <[EMAIL PROTECTED]>: >Quoting Hideki Kato <[EMAIL PROTECTED]>: > >> Yes, UCT does. From my recent experiments with a delay >> line (a fixed size FIFO queue) between a UCTsearcher and an MC >> simulator with RAVE against GNU Go 3.7.11 level 0 on 9x9 (single >> thread): >> >> delay #po wins games winning rate ELO 1 sigma of wr >> 0 1,000 721 2,000 36.05% -99.6 1.07% >> 1 1,000 721 2,000 36.05% -99.6 1.07% >> 2 1,000 690 2,000 34.50% -111.4 1.06% >> 3 1,000 663 2,000 33.15% -121.8 1.05% >> 5 1,000 642 2,000 32.10% -130.1 1.04% >> 10 1,000 522 2,000 26.10% -180.8 0.98% >> 20 1,000 412 2,000 20.60% -234.4 0.90% >> 50 1,000 82 2,000 4.10% -547.6 0.44% > >If I understand this correctly this simulation for delay 50 computes >50 playouts and then updates the tree.
Not exactly, if I understand correctly. The UCT searcher and the MC simulator is connected by a delay line of fixed size. So, except initial time, the tree is updated on every playout. The playout is sperated in two parts here, the search (descend) of the tree and the simulation, and there is a "delay" between them. That is, at time T, the tree is updated by the result of the simulation of the position at (T - 50) if the delay is 50. >In Valkyria I do the following. Every simulation from the root with >their own thread updates all nodes as visited down the tree before >entering the heavy playout. This means that all moves made in the tree >are temporarily updated as losses. When a playout has finished, half >of the moves were winners and updated accordingly. > >The idea behind this is that this hopefully avoids searching the same >path over and over again. Have tried anything like this? I use "modifying fpu" method (fpu was introduced in the first report of MoGo). It's very simple. Before simulating an optimal arc at a leaf node, the value (fpu) of the arc is decreased a little (value *= 0.99). This avoids simulating the same node if all arcs have the same value or no value (ie. at beginning of UCB1 algorithm) but does not avoid repeated simulations of significantly better arcs. >Also your results shows clearly that there is inefficency. But do you >also have results where for example delay 50 also computes 50x1000 >simulations so that we can see if what it means in practise? As my implementation expands one node on each simulation like MoGo, this happens in practice. NB: this experiment is for a network environment where MC simulation servers return results to a UCT searcher client with very long delay. Hideki -- [EMAIL PROTECTED] (Kato) _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
