On Thu, 2 Apr 2009, Erik van der Werf wrote:

On Wed, Apr 1, 2009 at 9:03 PM, Matthew Woodcraft
<[email protected]> wrote:
Erik van der Werf wrote:
Jonas Kahn wrote:
No there is no danger. That's the whole point of weighting with N_{s,a}.

N_{s,a} = number of times the node s has been visited, starting with parent
a.

You can write Value of a node a = (\sum_{s \in sons} N_{s,a} V_s) / (\sum
N_{s,a})

where V_s is ideally the «true» value of node s.
In UCT2, they use V_s = Q_{s,a} the win average of simulations going
through a, and then through s.
In UCT3, they use V_s = Q_s the win average of all simulations through
s.

There is a danger. The problem is that the selection policy also
implements the soft-max like behavior that ensures convergence to the
minimax result. If the you backup to all possible parents, including
those for which the child would have been an inferior choice, you may
get into trouble.

That's what I was worried about.

But I think it's ok the way Jonas describes above: you don't add
anything to the false-parent node's simulation count, and you don't
change the weight of the false-child in its value; you just change the
evaluation of the false-child.

(This means that the effect of backing up to alternate parents will be
smaller than the effect of backing up to the 'true' parent, which is
presumably part of the reason why this variant is less attractive.)


Ok, but I would not call that a back up; nothing goes up to the
alternative parents.

You certainly move less than for the parent in that particular
simulation, but you cannot say you move nothing.

The value of any node in the tree is a linear combination of the values
of its sons (or directly the winrate if it is a leave). We do not change
the coefficients of the combination, but we change recursively the
values of all the nodes from the leaves up to the root. But with same
mean (lower variance), and again, keeping the relative weights of the sons
(that is the linear combination coefficients).

This value is notably useful for the UCT path in the next simulation.

Unless I missed something, with this you only
make adjustments to the statistics representing transposed occurrences
of the same position.

Not sure I understand this sentence.

I don't see that this is how we should
interpret UCT3.

You cannot do much more than this (except complicated schemes with
bypassing simulations when some have already been made by transposition,
but you need to keep many other counts then). And it's much more complex
(and slow) than UCT2, since you update values up the whole tree and not
only at the leaves.

By the way, that's really the formula that is written.

Jonas
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to