Brian Sheppard wrote:

> In this strategy, one chooses a random number p, and then select the 
> strategy with highest historical mean if p > epsilon, and the 
> strategy taken least often otherwise. If epsilon = C*log(n)/n, where 
> n is the number of experiments so far, then the strategy has zero
> asymptotic regret.

Can you explain this in more detail?

>From what I understand, for each parameter you take with some high
probability the best so far, and with some lower probability the least
tried one. This requires (manually) enumerating all parameters on some
integer scale, if I got it correctly.

I'm interested in the following general problem:

Given a set of parameters each with a range of values, and a set of game
results between different parameter sets (or a score average for each
set versus a reference), how to determine:

a) the most likely optimal set of parameters
b) the most interesting parameter set to test next

Remi Coulom has done some work in this area:

http://remi.coulom.free.fr/QLR/

It sounds very interesting (v-optimal sampling). But I don't understand
it enough to implement it. Your idea sounds simpler, but the enumeration
would be a problem, for parameters with wide ranges where we don't know
where to start.

-- 
GCP
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to