On Wed, Nov 25, 2009 at 5:01 PM, Matthew Woodcraft <[email protected]>wrote:
> Don Dailey wrote: > > Matthew Woodcraft wrote: > > >> That doesn't seem to directly support deriving information from > >> random trials. For computer go tuning, would you play multiple games > >> with each parameter set in order to get a meaningful figure? That > >> seems likely to be less efficient than treating it as a bandit > >> problem. > > > This does not replace bandit, it's a way to tune parameters. > > Err, yes. I know that. > > > > You might have 50 parameters and so you play a few thousand games > > using random combinations of these parameters for instance. And then > > you have data based on the win/loss records of the different > > parameters and use this to settle on a "good" set of parameters to be > > used. > > Right so far. > > Further, it's useful to concentrate your efforts on the combinations of > parameters which are looking most promising. > > So it's related to bandit problems (you can view it as a bandit with a > rather large number of arms). > > I know this is rather "out there" but I wonder if instead of building a tree it's possible to use something like this to "evolve" a strategy for each move. It would be rather like trying to figure out how to do a playout on the fly and it might be tree-like but not a formal tree as such. Does that sound ridiculous? Naive monte carlo is kind of like this, find a strategy to play the next move. But naive MC has a fixed playout policy, not one that is being figured out as you go. Such a thing might figure out that you should not play move X if move Y has already been played for example. - Don > -M- > > _______________________________________________ > computer-go mailing list > [email protected] > http://www.computer-go.org/mailman/listinfo/computer-go/ >
_______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
