On Wed, Apr 1, 2009 at 9:03 PM, Matthew Woodcraft <matt...@woodcraft.me.uk> wrote: > Erik van der Werf wrote: >> >> Jonas Kahn wrote: >>> No there is no danger. That's the whole point of weighting with N_{s,a}. >>> >>> N_{s,a} = number of times the node s has been visited, starting with parent >>> a. >>> >>> You can write Value of a node a = (\sum_{s \in sons} N_{s,a} V_s) / (\sum >>> N_{s,a}) >>> >>> where V_s is ideally the «true» value of node s. >>> In UCT2, they use V_s = Q_{s,a} the win average of simulations going >>> through a, and then through s. >>> In UCT3, they use V_s = Q_s the win average of all simulations through >>> s. > >> There is a danger. The problem is that the selection policy also >> implements the soft-max like behavior that ensures convergence to the >> minimax result. If the you backup to all possible parents, including >> those for which the child would have been an inferior choice, you may >> get into trouble. > > That's what I was worried about. > > But I think it's ok the way Jonas describes above: you don't add > anything to the false-parent node's simulation count, and you don't > change the weight of the false-child in its value; you just change the > evaluation of the false-child. > > (This means that the effect of backing up to alternate parents will be > smaller than the effect of backing up to the 'true' parent, which is > presumably part of the reason why this variant is less attractive.)
Ok, but I would not call that a back up; nothing goes up to the alternative parents. Unless I missed something, with this you only make adjustments to the statistics representing transposed occurrences of the same position. I don't see that this is how we should interpret UCT3. Erik _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/