Re: [computer-go] Rapid action value estimation

Jason House Mon, 05 Nov 2007 12:14:38 -0800

On Nov 3, 2007 5:25 PM, Benjamin Teuber <[EMAIL PROTECTED]> wrote:

> On 11/3/07, Jason House <[EMAIL PROTECTED]> wrote:
>
> >
> >
> > On Fri, 2007-11-02 at 22:28 +0100, Benjamin Teuber wrote:
> > > I don't think there's something different at different depths in the
> > > tree..
> > > To update RAVE after a simulation, for each child of a node you
> > > visited during that simulation, you update if the move leading to the
> > > child was played later (until the end of the playout).
> >
> > I start each new simulation at the root of the search tree.  That could
> > make every node in the tree a child (grandchild, etc...) of a node that
> > was visited.  While traversing the entire tree to update values could be
> > done it seems complex and seems like it may bias results  too much.
> >
> > Do you stop at just the children of nodes that are visited and not
> > extend to grandchildren?
>
>
> Sure, I was just referring to direct children. So, for each node n you
> visited during this simulation and each move m later played during that
> simulation by the player moving in position n, you update the node you would
> get to from n by moving at m - if m is legal in n..
>


I implemented this yesterday.  In doing so, I realized I didn't know the
proper way to initialize new leaves in the UCT tree.  MoGo papers seem to
talk about a progression from always picking an unexplored leaf (AKA using
infinity for the upper confidence bound), to "first play urgency" (using a
fixed ucb for new leaves), to using patterns.

I don't yet have patterns and am curious what is recommended.  If no real
sims exist for a child, I use the first play urgency of 110%.  If no amaf
sims exist for a child, I pick it for immediate simulation.

Have any techniques (without patterns) proven more effective?

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Rapid action value estimation

Reply via email to