Re: [computer-go] Re: Amsterdam 2007 paper

Rémi Coulom Fri, 18 May 2007 23:48:59 -0700

John Tromp wrote:

On 5/18/07, Rémi Coulom <[EMAIL PROTECTED]> wrote:
My idea was very similar to what you describe. The program built a
collection of rules of the kind "if condition then move". Condition
could be anything from a "tree-search rule" of the kind "in this
particular position play x", or general rule such as "in atari, extend".
It could be also anything in-between, such as a miai specific to the
current position. The strengths of moves were updated with an
incremental Elo-rating algorithm, from the outcomes of randomsimulations.
The obvious way to update weights is to reward all the
rules that fired for the winning side, and penalize all rules thatfired for
the losing side, with rewards and penalties decaying toward the end
of the playout. But this is not quite Elo like, since it doesn'tconsider rulesto beat each other. So one could make the reward dependent on therelativeweight of the chosen rule versus all alternatives. increasing thereward if the
alternatives carried a lot of weight.
Is that how your ratings worked?

It is Elo-like in the generalized Bradley-Terry sense I describe in mypaper: you have one team of one color beating one team of the othercolor. What I do exactly is compute the total Elo rating of black moves(with a decay so that clean-up moves don't count, and moves close to theroot count more), and the total Elo rating of white moves. Then Icompute the difference between the real outcome and the expected outcomeaccording to Elo ratings, and correct Elo ratings proportionally to thatdifference.


Rémi
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: Amsterdam 2007 paper

Reply via email to