On Tue, 31 Mar 2009, Matthew Woodcraft wrote:
Jonas Kahn wrote:
You might be interested by this article, for a very complete and tested
answer. Plus the idea of grouping, but a good part of the effect seems
to me to be giving a heuristic pre-value to moves, which might be done more
efficiently otherwise:
eprints.pascal-network.org/archive/00004571/01/8057.pdf
Thank you (and to the others who replied).
The idea of backing a simulation's results up to all parents ('UCT3' in
that paper) seems very dangerous to me! It's a shame they didn't have
any Go results to show for that one.
No there is no danger. That's the whole point of weighting with N_{s,a}.
N_{s,a} = number of times the node s has been visited, starting with parent a.
You can write
Value of a node a = (\sum_{s \in sons} N_{s,a} V_s) / (\sum N_{s,a})
where V_s is ideally the «true» value of node s.
In UCT2, they use V_s = Q_{s,a} the win average of simulations going
through a, and then through s.
In UCT3, they use V_s = Q_s the win average of all simulations through
s.
Assuming Markovianity (1), Q_s is a random variable with same mean as Q_{s,a}, but lower variance.
That's all.
Jonas
(1) This might be broken if you give a heuristic value to your move in
the tree based on how near it is to previous moves, but that's not
really important.
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/