Hi Yamato,
However I cannot find any explanation for it. Does anyone know what Discounted UCB is?
"Discounted" means you forget somehow the past. More precisely, if "w" is your count of wins, and "t" your total playouts, and "r" the results of the current simulation, instead of doing: w <- w+r t <- t+1 you do, with gamma <1: w<- gamma *w + r t <- gamma*t + 1 So it is as if you kept a memory of the order of 1/(1-gamma)
Is it useful for MC Go?
The idea is appealing for UCT, as the distribution of the arms is not stationary, and discounting is the simplest idea to deal with non-stationarity. However, all my trials in this direction had been unsuccessful. Maybe some succeed I don't know. I hope that makes things clearer, Sylvain _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
