On Wed, Nov 25, 2009 at 5:01 PM, Matthew Woodcraft
<[email protected]>wrote:

> Don Dailey wrote:
> > Matthew Woodcraft wrote:
>
> >> That doesn't seem to directly support deriving information from
> >> random trials. For computer go tuning, would you play multiple games
> >> with each parameter set in order to get a meaningful figure? That
> >> seems likely to be less efficient than treating it as a bandit
> >> problem.
>
> > This does not replace bandit, it's a way to tune parameters.
>
> Err, yes. I know that.
>
>
> > You might have 50 parameters and so you play a few thousand games
> > using random combinations of these parameters for instance. And then
> > you have data based on the win/loss records of the different
> > parameters and use this to settle on a "good" set of parameters to be
> > used.
>
> Right so far.
>
> Further, it's useful to concentrate your efforts on the combinations of
> parameters which are looking most promising.
>
> So it's related to bandit problems (you can view it as a bandit with a
> rather large number of arms).
>
>
I know this is rather "out there" but I wonder if instead of building a
tree it's possible to use something like this  to "evolve" a strategy for
each move.   It would be rather like trying to figure out how to do a
playout on the fly and it might be tree-like but not a formal tree as
such.   Does that sound ridiculous?    Naive monte carlo is kind of like
this,  find a strategy to play the next move.  But naive MC has a fixed
playout policy, not one that is being figured out as you go.

Such a thing might figure out that you should not play move X if move Y has
already been played for example.

- Don






> -M-
>
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to