[computer-go] Re: Amsterdam 2007 paper

David Silver Mon, 21 May 2007 10:21:58 -0700

On 5/18/07, RÃ©mi Coulom <[EMAIL PROTECTED]> wrote:
My idea was very similar to what you describe. The program built a
collection of rules of the kind "if condition then move". Condition
could be anything from a "tree-search rule" of the kind "in this
particular position play x", or general rule such as "in atari,extend".
It could be also anything in-between, such as a miai specific to the
current position. The strengths of moves were updated with an
incremental Elo-rating algorithm, from the outcomes of randomsimulations.
The obvious way to update weights is to reward all the
rules that fired for the winning side, and penalize all rules thatfired for
the losing side, with rewards and penalties decaying toward the end
of the playout. But this is not quite Elo like, since it doesn'tconsider rulesto beat each other. So one could make the reward dependent on therelativeweight of the chosen rule versus all alternatives. increasing thereward if the
alternatives carried a lot of weight.
Is that how your ratings worked?

I'm not sure how that compares with TD learning. Maybe someone more
familiar with the latter can point out the differences.

TD learning (with linear function approximation) uses a gradientdescent rule to update weights. The simplest gradient descent rule,LMS or Widrow-Hoff, does something like you describe: rules that arefollowed by positive reward (win) are increased in weight, and rulesthat are followed by negative reward (loss) are decreased. The exactupdate depends on the set of rules firing, and is proportional to theerror between the estimated reward (based on all rules) and theactual reward. In other words, each weight is updated a littletowards the value which would have made a correct overall prediction.TD learning is similar, except that it updates weights towards asubsequent prediction of the reward (e.g. on the next move), insteadof the actual reward. Rich Sutton gives a much better explanationthan me: http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Re: Amsterdam 2007 paper

Reply via email to