Re: [computer-go] Rapid action value estimation

Jason House Sat, 03 Nov 2007 10:20:59 -0800


On Fri, 2007-11-02 at 22:28 +0100, Benjamin Teuber wrote:
> I don't think there's something different at different depths in the
> tree..
> To update RAVE after a simulation, for each child of a node you
> visited during that simulation, you update if the move leading to the
> child was played later (until the end of the playout).


I start each new simulation at the root of the search tree.  That could
make every node in the tree a child (grandchild, etc...) of a node that
was visited.  While traversing the entire tree to update values could be
done it seems complex and seems like it may bias results  too much.

Do you stop at just the children of nodes that are visited and not
extend to grandchildren?  


> Then, always when you calculate the UCT value, you combine that with
> the RAVE value with that weighted average formula to give the final
> score.
> Of course, you need to be careful with signs :-)
> 
> Btw, I don't really see a point in calculating and adding the
> confidence bound for RAVE as well, as all moves will have been played
> almost equally often - thus I dropped the term.. 
> Maybe Sylvain or someone else can comment on this..

I'll experiment with this after I get the initial formula to work.


> Another thing - I didn't believe that you need to do RAVE seperately
> for both colors (i.e. you should only consider later moves on the
> point by the same color), as e.g. Peter Drake mentioned in a paper of
> his. But after some experiments I changed my mind and think he is
> right =)


Do you have a link to the paper?

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Rapid action value estimation

Reply via email to