On Thu, 2008-10-30 at 19:15 -0400, Jason House wrote: > The error bars of all bots overlap. I'm not familiar enough with > BayesELO to compute p-values. I'd bet that only the 0.1 version has a > statistically significant strength difference.
Of course. I could get within a couple of ELO by running 100,000 games. You could view the 0.01 and 0.025 version as a combined version that chooses one value or the other randomly (about 50% for each.) Such a bot would be scoring around 2005 after 1500 games! My intuition is that if the idea works, it will gradually work better with higher constants until it reaches some plateau, then decrease. I'm testing 0.15 and 0.20 now. 0.15 is doing quite well and 0.20 is on the bottom. Only have 150 games though for these. - Don > Sent from my iPhone > > On Oct 30, 2008, at 7:00 PM, Don Dailey <[EMAIL PROTECTED]> wrote: > > > The basic idea seems to be a modest improvement after 752 games. Note > > that ALL versions with the incentive play stronger. > > > > I'm going to try more aggressive values now - when I find a reasonable > > value I'll try tanh() stuff. > > > > Rank Name Elo + - games score oppo. draws > > 1 inc-0.1 2033 19 19 752 54% 2004 0% > > 2 inc-0.025 2008 19 19 750 49% 2012 0% > > 3 inc-0.01 2003 19 19 752 48% 2014 0% > > 4 mwNoDup-2000 2000 19 19 750 48% 2015 0% > > > > > > > > > > > > On Thu, 2008-10-30 at 14:59 -0200, Mark Boon wrote: > >> Funny, I have been playing with something very similar. Although I > >> got side-tracked to something else for the moment. Intuitively I felt > >> tanh() was more appropriate than a linear function. Although you may > >> want to have the inverse of that, as I was trying to calculate the > >> territory certainty whereass you want the territory uncertainty. > >> > >> Mark > >> > >> On 30-okt-08, at 14:21, Don Dailey wrote: > >> > >>> Reference bot enhancement > >>> ========================= > >>> > >>> Here is another possible enhancement to the reference bot which I am > >>> currently testing. I do not yet have anything conclusive enough to > >>> report, but it looks good so far with a small number of games. > >>> > >>> But even if this idea doesn't pan out, it will produce a much more > >>> natural playing style without weakening the bot. > >>> > >>> Here is how it works. We will use 1000 playouts for our example: > >>> > >>> 1. Modify the bot to keep a "futures" table. At the end of each > >>> playout, tally the wins for white and black for each point on the > >>> board. (I tally -1 for a white win, 1 for a black win to get a > >>> final score from -1000 to 1000 for each point.) > >>> > >>> 2. When the 1000 playouts are complete, compute an "uncertainty > >>> value" > >>> for each point, where 1.0 is completely uncertain, and 0.0 is > >>> completely certain. A point is completely certain if at the end > >>> of > >>> each playout it was ALWAYS owned by one player or the other. It's > >>> completely uncertain if it won 50% of the time for either side. > >>> > >>> 3. When determining which move to play, apply an uncertainty delta > >>> to > >>> the computed score of each move. This is simply some fraction of > >>> the "uncertainty value" and the best value I've tested so far is > >>> 0.025. So you get a bonus that ranges from 0.0 to 0.025. > >>> > >>> 4. Choose the move with the best (sc + uncertainty_delta.) > >>> > >>> 5. The incentive must be small, large incentives will destroy the > >>> playing strength. For instance 0.1 is too high and weakens it. > >>> The value that is testing the best for me (of the ones I've tried > >>> so far) is 0.025 > >>> > >>> 6. This may test at some levels better than others. I'm testing > >>> at 2000 playouts. > >>> > >>> The idea is to gently encourage the bot to avoid playing to points > >>> that are clearly a forgone conclusion (or conversely, encourage it > >>> to > >>> play where the "action" is.) > >>> > >>> This should make the bot play much less artificially. Near the > >>> end of > >>> the game it will prefer moves to unresolved points. Earlier in the > >>> game it will avoid moving to areas that are "probably" already won > >>> or > >>> lost. > >>> > >>> My feeling is that these "incentives" should probably be > >>> calculated in > >>> a non-linear way, but what I described is a good starting point. > >>> From > >>> experiments in the past it seems more important to put the focus and > >>> most of the weight on avoiding play to highly certain points. So I > >>> will try some non-linear formula next. > >>> > >>> > >>> - Don > >>> > >>> _______________________________________________ > >>> computer-go mailing list > >>> [email protected] > >>> http://www.computer-go.org/mailman/listinfo/computer-go/ > >> > >> _______________________________________________ > >> computer-go mailing list > >> [email protected] > >> http://www.computer-go.org/mailman/listinfo/computer-go/ > > _______________________________________________ > > computer-go mailing list > > [email protected] > > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ > computer-go mailing list > [email protected] > http://www.computer-go.org/mailman/listinfo/computer-go/
signature.asc
Description: This is a digitally signed message part
_______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
