On Tue, 31 Mar 2009, Matthew Woodcraft wrote:

Jonas Kahn wrote:
You might be interested by this article, for a very complete and tested
answer. Plus the idea of grouping, but a good part of the effect seems
to me to be giving a heuristic pre-value to moves, which might be done more
efficiently otherwise:

eprints.pascal-network.org/archive/00004571/01/8057.pdf

Thank you (and to the others who replied).

The idea of backing a simulation's results up to all parents ('UCT3' in
that paper) seems very dangerous to me! It's a shame they didn't have
any Go results to show for that one.

No there is no danger. That's the whole point of weighting with N_{s,a}.

N_{s,a} = number of times the node s has been visited, starting with parent a.

You can write Value of a node a = (\sum_{s \in sons} N_{s,a} V_s) / (\sum N_{s,a})

where V_s is ideally the «true» value of node s.
In UCT2, they use V_s = Q_{s,a} the win average of simulations going
through a, and then through s.
In UCT3, they use V_s = Q_s the win average of all simulations through
s.

Assuming Markovianity (1), Q_s is a random variable with same mean as Q_{s,a}, but lower variance. That's all.

Jonas
(1) This might be broken if you give a heuristic value to your move in
the tree based on how near it is to previous moves, but that's not
really important.
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to