My original example was unrealistic and on the extreme side to make a point.
However if there are nodes with say 7/10, 12/20, and 50/100 how should they be
ranked? In some sense, the first one seems promising since we've only searched
just a few nodes, yet we are mainly seeing wins (granted,
Determining the best move is tricky, however. The most natural approach would
be to pick the move with the highest probability of leading to a win. But this
is usually too risky. For example, a move with 7 wins out of 10 trials may have
the highest odds of winning (70 percent), but because
Thomas Lavergne wrote:
So when I select a child I use only the winrate of choosing this specific
child from this specific position, not choosing any child from any goban who
have lead to the same target position. I think this is cleaner.
Is this what you're referring to? In Kocsis's paper,
When assigning the credit value for a node which is a transposition, are
*all* parents which point to that node credited, or just the particular one
which led to that transposition in the current continuation? If the former
case, that's seems like the TT would lead to more efficient
Yes, that' how it's typically done, but I don't see how that leads to
finding *all* parents of a given node. The situation I'm referring to is
when you're propagating the UCT value back up the tree following a
simulation. If you also want to update *all* parents of that node (not just