On Jun 26, 2008, at 1:35 AM, Magnus Persson wrote:
Quoting Peter Drake <[EMAIL PROTECTED]>:
UCB (and hence UCT) would treat the following sequences of wins
(1) and
losses (0) the same:
01010101010101010101010101010101
00000000000000001111111111111111
11111111111111110000000000000000
I have two comments. Isn't the problem here that UCT will not
search the second sequence at all because it is so bad initially?
Well, UCT never really discards a branch, just samples it less and
less often. It would eventually go back to sequence 2 and discover
that it had become good.
So will there ever be a situation like this? UCT will more likely
continue to sample the first and third sequenc settling for the
first if their are no change again. And when it finally discovers
that sequene 2 is good it will quickly choose it.
The second thing is that in practice you will rarely get any clear
patterns like this so how would you be able to detect any recency?
I'm thinking of a situation where move A looks good when we're
assuming a random response from the opponent, but once the child node
starts using UCB, it is discovered that the opponent has a very good
response, so now runs through A start losing. Later, runs through A
start winning again because we discover a good counterresponse...
One simple trick is to always replay a move in the tree if it won
the last time the position was visited, and only use UCT for
positions where the last played moved lost. Should'nt this work
like a dream if sequence 2 truly goes to 100% winrate directly
after the wirst win is scored?
Probably, but we rarely get such pure victory branches. I have a
number of schemes in mind, though.
Has anyone tried this empirically?
Peter Drake
http://www.lclark.edu/~drake/
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/