The correct way to evaluate an action is the expected value (yet another name for the mean) of the utility function (there is a theorem by Von Neumannand Morgenstern about this http://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem ). Since we want to win the game, it makes sense to use a utility of 1 for winning and 0 for losing, and this means that the probability of winning should be maximized. As far as I can tell, the theory ends there.
What seems to me like the rational way to deal with handicap games is to introduce some form of player modeling in the MC simulations. If you are playing against a weaker opponent there might be some way of doing this, but if you are the weaker player, I'm not sure how to model someone stronger. Perhaps you could force the weaker side to pass some fraction (say, 5%) of the time in the playouts. Has anybody tried anything of that type? Álvaro. On Tue, Oct 5, 2010 at 3:17 PM, Erik van der Werf <[email protected]> wrote: > 2010/10/5 Gunnar Farnebäck <[email protected]>: >> On 10/05/10 16:59, Erik van der Werf wrote: >>> >>> Lukasz brings up an interesting point. Winrate may not be the ideal >>> statistic for all situations. Maybe the average score (as used in most >>> early work on MC), soft-max, or a median tracker would be better for >>> some situations. >>> >>> Maybe a nice question for the academics: >>> >>> If you were free to keep track of a histogram for all possible scores >>> in each node (so you have everything from winrate at every possible >>> komi to simply the average score), then what would be the optimal >>> selection strategy? >> >> I have long suspected that median score is the way to go. > > The median score may be interesting for updating dynamic komi, but for > selecting the best move it is not enough. One reason is that it does > not account for bounds on the distribution (so it can't distinguish a > sure win from an uncertain outcome). > > >>> And if the above can be answered, what would be the minimum set of >>> statistics needed to maintain similar performance? >> >> But I have never tested to implement it because I haven't found a way >> to compute it without storing the complete histogram, which would make >> the nodes huge. > > You can track the median with fixed increments/decrements depending on > whether the score is higher/lower than the current value. The > magnitude of the steps relate to some kind of forgetting factor. > Different quantiles can be obtained by changing the ratio of the > update magnitudes. > > Erik > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
