On Sat, Jan 17, 2009 at 08:29:32PM +0100, Sylvain Gelly wrote:
> ChooseMove(node, board) {
> bias = 0.015 // I put a random number here, to be tuned
> b = bias * bias / 0.25
> best_value = -1
> best_move = PASSMOVE
> for (move in board.allmoves) {
> c = node.child(move).counts
> w = node.child(move).wins
> rc = node.rave_counts[move]
> rw = node.rave_wins[move]
> coefficient = 1 - rc / (rc + c + rc * c * b)
> value = w / c * coef + rw / rc * (1 - coef) // please here take care of
> the c==0 and rc == 0 cases
> if (value > best_value) {
> best_value = value
> best_move = move
> }
> }
> return best_move
> }
Hi,
it seems to me that, when you select play in the tree, you don't have an
exploration component. You use just a weighted average of score and RAVE
score.
So, if :
- the best play is a good only if played immediatly and very bad if
played later in the game :
- the first playout for this play resulted in a lost.
score and RAVE score will be very low and this play will never be
considered again until a very long time.
Is it simplified code and in reality you replace w/c and rw/rc by scores
with exploration component or did you realy use it as is ?
Tom
--
Thomas Lavergne "Entia non sunt multiplicanda praeter
necessitatem." (Guillaume d'Ockham)
[email protected] http://oniros.org
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/