On Jun 26, 2008, at 1:35 AM, Magnus Persson wrote:

Quoting Peter Drake <[EMAIL PROTECTED]>:

UCB (and hence UCT) would treat the following sequences of wins (1) and
losses (0) the same:

01010101010101010101010101010101
00000000000000001111111111111111
11111111111111110000000000000000

I have two comments. Isn't the problem here that UCT will not search the second sequence at all because it is so bad initially?

Well, UCT never really discards a branch, just samples it less and less often. It would eventually go back to sequence 2 and discover that it had become good.

So will there ever be a situation like this? UCT will more likely continue to sample the first and third sequenc settling for the first if their are no change again. And when it finally discovers that sequene 2 is good it will quickly choose it.

The second thing is that in practice you will rarely get any clear patterns like this so how would you be able to detect any recency?

I'm thinking of a situation where move A looks good when we're assuming a random response from the opponent, but once the child node starts using UCB, it is discovered that the opponent has a very good response, so now runs through A start losing. Later, runs through A start winning again because we discover a good counterresponse...

One simple trick is to always replay a move in the tree if it won the last time the position was visited, and only use UCT for positions where the last played moved lost. Should'nt this work like a dream if sequence 2 truly goes to 100% winrate directly after the wirst win is scored?

Probably, but we rarely get such pure victory branches. I have a number of schemes in mind, though.

Has anyone tried this empirically?

Peter Drake
http://www.lclark.edu/~drake/



_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to