Erik van der Werf wrote:
> >> Jonas Kahn wrote:
>> No there is no danger. That's the whole point of weighting with N_{s,a}.
>>
>> N_{s,a} = number of times the node s has been visited, starting with parent
>> a.
>>
>> You can write Value of a node a = (\sum_{s \in sons} N_{s,a} V_s) / (\sum
>> N_{s,a})
>>
>> where V_s is ideally the «true» value of node s.
>> In UCT2, they use V_s = Q_{s,a} the win average of simulations going
>> through a, and then through s.
>> In UCT3, they use V_s = Q_s the win average of all simulations through
>> s.> There is a danger. The problem is that the selection policy also > implements the soft-max like behavior that ensures convergence to the > minimax result. If the you backup to all possible parents, including > those for which the child would have been an inferior choice, you may > get into trouble. That's what I was worried about. But I think it's ok the way Jonas describes above: you don't add anything to the false-parent node's simulation count, and you don't change the weight of the false-child in its value; you just change the evaluation of the false-child. (This means that the effect of backing up to alternate parents will be smaller than the effect of backing up to the 'true' parent, which is presumably part of the reason why this variant is less attractive.) -M- _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
