>From what I understand, for each parameter you take with some high
>probability the best so far, and with some lower probability the least
>tried one. This requires (manually) enumerating all parameters on some
>integer scale, if I got it correctly.

Yes, for each parameter you make a range of values that you believe brackets
the optimal value. Then the optimizer takes over and (eventually)
concentrates attention on the value that works best.

For instance, if your program includes a parameter for the number of RAVE
wins to initially assign to an atari move, then you might test 0, 1, 2, ...
10. (Note that parameters do not have to be integers. Just an integer number
of choices.)

Note that a value of 0 sometimes turns off a feature, in effect. When a
parameter is turned off I will put the entire code within an if-statement,
to approximate the CPU gain from omitting the code entirely:

    if (theParameter != 0) {
       evaluation += theParameter * someFunction();
    }

For the domain of Go, I typically find a broad range of near-optimal values.
In fact, there is often insignificant variations over the entire range:
perhaps a 1% difference. But I do see 10% gains for certain features, and I
never know what will be important.

I have tested systems that automatically create new values by interpolation.
But I haven't seen any benefit.


>Given a set of parameters each with a range of values, and a set of game
>results between different parameter sets (or a score average for each
>set versus a reference), how to determine:

>a) the most likely optimal set of parameters
>b) the most interesting parameter set to test next

The system above will choose the most interesting parameter set to test
next. Now, whether the policy is zero-regret is more complicated.

In the case of a single parameter, the method is zero regret. If you want to
automatically introduce new values to test, then you can make a zero-regret
policy if the objective function is bounded and continuous.

But the more interesting case is where you have multiple parameters. In that
case, I think that my policy will find the global optimum if the objective
function is convex.

Convexity is not guaranteed in our domain, but is often the case. That is,
the benefit is small if a parameter is either too small or too large, and
the optimum is somewhere in between.

I could be wrong about some of these statements, because I haven't paused to
try to prove anything from a mathematical perspective. Instead, I am
convinced by some experimental observation of some pragmatic outcomes:

  1) does the optimizer tweak parameters towards optimality? (yes)

  2) does the optimizer work without needing my attention? (yes)

  3) if I change the program in other ways, will it adapt? (yes)

  4) if I code a bad feature, will the optimizer stop using it? (yes)

All together, this is not bad value.

I should mention one tweak, relating to the fact that your program might
sometimes face strong players, and other times weak players. I keep a record
of average rating for the opponents faced by each parameter value. Then when
I explore I can choose either the least-taken parameter value or the
parameter value whose average opponent would be most affected by playing
this game. This policy keeps average opponent ratings in line.

Brian

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to