I've also tried a variety of ways to use point-ownership in combination with RAVE. By no means was it an exhaustive study, but I failed to find an intuitive way to improve play this way.
I didn't try enough to be able to come to hard conclusions, but at the very least it didn't turn out to be obvious. Mark On Fri, Jun 5, 2009 at 3:39 AM, Brian Sheppard <sheppar...@aol.com> wrote: > In a paper published a while ago, Remi Coulom showed that 64 MC trials > (i.e., just random, no tree) was a useful predictor of move quality. > > In particular, Remi counted how often each point ended up in possession > of the side to move. He then measured the probability of being the best > move as a function of the frequency of possession. Remi found that if > the possession frequency was around 1/3 then the move was most likely > to be best, with decreasing probabilities elsewhere. > > I have been trying to extract more information from each trial, since > it seems to me that we are discarding useful information when we use > only the result of a trial. So I tried to implement Remi's idea in a UCT > program. > > This is very different from Remi's situation, in which the MC trials are > done before the predictor is used in a tree search. Here, we will have > a tree search going on concurrently with collecting data about point > ownership. > > My implementation used the first N trials of each UCT node to collect > point ownership information. After the first M trials, it would use that > information to bias the RAVE statistics. That is, in the selectBest > routine I had an expression like this: > > for all moves { > // Get the observed RAVE values: > nRAVE = RAVETrials[move]; > wRAVE = RAVEWins[move]; > > // Dynamically adjust according to point ownership: > if (trialCount < M) { > ; // Do nothing. > } > else if (Ownership[move] < 0.125) { > nRAVE += ownershipTrialsParams[0]; > wRAVE += ownershipWinsParams[0]; > } > else if (Ownership[move] < 0.250) { > nRAVE += ownershipTrialsParams[1]; > wRAVE += ownershipWinsParams[1]; > } > else if (Ownership[move] < 0.375) { > nRAVE += ownershipTrialsParams[2]; > wRAVE += ownershipWinsParams[2]; > } > else if (Ownership[move] < 0.500) { > nRAVE += ownershipTrialsParams[3]; > wRAVE += ownershipWinsParams[3]; > } > else if (Ownership[move] < 0.625) { > nRAVE += ownershipTrialsParams[4]; > wRAVE += ownershipWinsParams[4]; > } > else if (Ownership[move] < 0.750) { > nRAVE += ownershipTrialsParams[5]; > wRAVE += ownershipWinsParams[5]; > } > else if (Ownership[move] < 0.875) { > nRAVE += ownershipTrialsParams[6]; > wRAVE += ownershipWinsParams[6]; > } > else { > nRAVE += ownershipTrialsParams[7]; > wRAVE += ownershipWinsParams[7]; > } > > // Now use nRAVE and wRAVE to order the moves for expansion.... > } > > The bottom line is that the result was negative. In the test period, Pebbles > won 69% (724 out of 1039) of CGOS games when not using this feature and > less than 59% when using this feature. I tried a few parameter settings. > Far from exhaustive, but mostly in line with Remi's paper. > The best parameter settings showed 59% (110 out of 184, which is 2.4 > standard deviations lower). But maybe you can learn from my mistakes > and figure out how to make it work. > > I have no idea why this implementation doesn't work. Maybe RAVE does a > good job already of determining where to play, so ownership information > is redundant. Maybe different parameter settings would work. Maybe just > overhead (but I doubt that; the overhead wouldn't account for such a > significant drop). > > Anyway, if you try something like this, please let me know how it works out. > Or if you have other ideas about how to extract more information from > trials. > > Best, > Brian > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/