Re: [Computer-go] win rate bias and CLOP

Olivier Teytaud Wed, 04 Jan 2012 07:36:12 -0800

>
>
> BTW, I imagine that "CLOP" could be any fully automated parameter tuning
> solution. That is, nothing here is really specific to CLOP. It just happens
> that CLOP is the first fully automated parameter tuning system that I have
> made to work.
>
>
Hi Brian and others;
I'm not sure that CLOP could be any fully automated parameter tuning
solution.
The point is that there are roughly two families of algorithms:
1) those in which exploration is roughly equal to recommendation; we test
parameters close to our current approximation
    of the optimum;
2) those in which exploration is really different from recommendation: we
test parameters possibly far from the optimum,
    because the variance is higher and we get more information there.
I assume that CLOP is in category 2.
EDA or cross-entropy methods are in category 1.
Category 2 is much faster, but assumes some sort of symmetry or regularity,
so that what you learn far from the optimum can
be used for guessing the optimum.
Some people, for philosophical reasons, don't like algorithms of category
2, considering that it is not reasonable to sample by
principles like "maximum uncertainty" and regression.
I'm not sure, but I think Rémi posted something around that a long time ago
- and said that he thinks that usually in games
this problem of "symmetry assumption" is not a major trouble... but Rémi
knows more than me. In large dimension, I would tend to believe that
approach 2 is often much better; as well as when noise is important around
the optimum; I mean, on usual cases (it's clear that we can design
counter-examples to such a simple rule).


For tests I have made, algorithms in category 2 really make a difference.
I've done parameter tuning by algorithms of category 1 and, if I do it
again one day, I'll try category 2.

On artificial test beds, category 2 is much faster on e.g. the noisy
objective function f(x) = Bernoulli(c+||x-x*||^2),
with c>0, where x*
is the optimal parameterization (suboptimality in terms of expected
objective function at the parameter chosen after n games decreasing as 1/n
instead of 1/sqrt(n), neglecting logarithmic factors).
On the other hand, for Bernoulli(||x-x*||^2) (case c=0), both have
1/sqrt(n).

As far as I know, the rates above are known (proved) for some specific
algorithms, but the fact that algorithms in category 1
can not reach 1/n on f(x)=Bernoulli(c+||x-x*||2) has not been proved.

Best regards,
Olivier

_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] win rate bias and CLOP

Reply via email to