Re: [computer-go] UCB/UCT and moving targets

Peter Drake Thu, 26 Jun 2008 08:07:33 -0700

On Jun 26, 2008, at 1:35 AM, Magnus Persson wrote:

Quoting Peter Drake <[EMAIL PROTECTED]>:
UCB (and hence UCT) would treat the following sequences of wins(1) and
losses (0) the same:

01010101010101010101010101010101
00000000000000001111111111111111
11111111111111110000000000000000
I have two comments. Isn't the problem here that UCT will notsearch the second sequence at all because it is so bad initially?

Well, UCT never really discards a branch, just samples it less andless often. It would eventually go back to sequence 2 and discoverthat it had become good.

So will there ever be a situation like this? UCT will more likelycontinue to sample the first and third sequenc settling for thefirst if their are no change again. And when it finally discoversthat sequene 2 is good it will quickly choose it.
The second thing is that in practice you will rarely get any clearpatterns like this so how would you be able to detect any recency?

I'm thinking of a situation where move A looks good when we'reassuming a random response from the opponent, but once the child nodestarts using UCB, it is discovered that the opponent has a very goodresponse, so now runs through A start losing. Later, runs through Astart winning again because we discover a good counterresponse...

One simple trick is to always replay a move in the tree if it wonthe last time the position was visited, and only use UCT forpositions where the last played moved lost. Should'nt this worklike a dream if sequence 2 truly goes to 100% winrate directlyafter the wirst win is scored?

Probably, but we rarely get such pure victory branches. I have anumber of schemes in mind, though.


Has anyone tried this empirically?

Peter Drake
http://www.lclark.edu/~drake/



_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] UCB/UCT and moving targets

Reply via email to