It's good that you posted this. I liked the paper. (And I understand
proof number search a little better after reading it too.) But it describes the
epsilon trick in terms of transposition tables which I don't use. I didn't see
how it would apply to a broader range of applications or, specifically, to my
own program.
The text below explains it nicely. I will definitely try this.
- Dave Hillis
-----Original Message-----
From: [EMAIL PROTECTED]
To: [email protected]
Sent: Sun, 14 Jan 2007 4:50 AM
Subject: [computer-go] Fast UCT (epsilon trick)
Hi
I've looked back into the article and I see that it is not trivial to
understand how does the epsilon trick (ET) works. Therefore I will
describe it here on the example of UCT.
The bottleneck of UCT algorithm is choosing a move. Program has to
iterate over all children
evaluating their UCB value. We want to use of this loop rarely.
One way to do it is to group playouts together. I.e. play 2 or more MC
simulations per each tree traversal. But this is not good enough as it
doesn't distribute
the playouts over the tree and some parts will not get enough of them.
The ET works as follows:
With each node of the tree program keeps additional information:
- child_to_visit
- visits_to_go
When we visit node,
if node.visits_to_go > 0 then
node->visits_to_go <- node.visits_to_go - 1
node = child_to_visit
So we have predetermined strategy to go to particular child visits_to_go times.
But
if node.visis_to_go = 0 then // we have to set the strategy first
node->child_to_visit = UCT_find_child (node)
node->visits_to_go = Epsilon * node->visited_count // round up
As we can see, when Epsilon is small or when a particular node is
visited not too many times, UCT-ET is identical to UCT.
But when node was visited more times (near the root, far from leaves),
then UCT-ET may save some time and go to precalculated child.
That's it.
I hope You like it :)
Ćukasz Lew
PS.
In Df-PN, good Epsilon was 1/4 I believe that here it will be much
smaller. 1/64 maybe.
Or even one may change the Epsilon equation to:
node->visits_to_go = Epsilon * sqrt(node->visited_count)
If someone will do some experiments, I'm eager to hear the results.
PS2
float InvSqrt (float x) {
float xhalf = 0.5f * x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x * (1.5f - xhalf * x * x);
return x;
}
;-)
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/
________________________________________________________________________
Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam
and email virus protection.
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/