Hi Yamato,

However I cannot find any explanation for it.
Does anyone know what Discounted UCB is?

"Discounted" means you forget somehow the past. More precisely, if "w"
is your count of wins, and "t" your total playouts, and "r" the
results of the current simulation, instead of doing:

w <- w+r
t <- t+1

you do, with gamma <1:

w<- gamma *w + r
t <- gamma*t + 1

So it is as if you kept a memory of the order of 1/(1-gamma)

Is it useful for MC Go?
The idea is appealing for UCT, as the distribution of the arms is not
stationary, and discounting is the simplest idea to deal with
non-stationarity.
However, all my trials in this direction had been unsuccessful. Maybe
some succeed I don't know.

I hope that makes things clearer,
Sylvain
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to