Brian Sheppard wrote: > In this strategy, one chooses a random number p, and then select the > strategy with highest historical mean if p > epsilon, and the > strategy taken least often otherwise. If epsilon = C*log(n)/n, where > n is the number of experiments so far, then the strategy has zero > asymptotic regret.
Can you explain this in more detail? >From what I understand, for each parameter you take with some high probability the best so far, and with some lower probability the least tried one. This requires (manually) enumerating all parameters on some integer scale, if I got it correctly. I'm interested in the following general problem: Given a set of parameters each with a range of values, and a set of game results between different parameter sets (or a score average for each set versus a reference), how to determine: a) the most likely optimal set of parameters b) the most interesting parameter set to test next Remi Coulom has done some work in this area: http://remi.coulom.free.fr/QLR/ It sounds very interesting (v-optimal sampling). But I don't understand it enough to implement it. Your idea sounds simpler, but the enumeration would be a problem, for parameters with wide ranges where we don't know where to start. -- GCP _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
