The basic idea seems to be a modest improvement after 752 games.  Note
that ALL versions with the incentive play stronger.  

I'm going to try more aggressive values now - when I find a reasonable
value I'll try tanh() stuff.

Rank Name           Elo    +    - games score oppo. draws 
   1 inc-0.1       2033   19   19   752   54%  2004    0% 
   2 inc-0.025     2008   19   19   750   49%  2012    0% 
   3 inc-0.01      2003   19   19   752   48%  2014    0% 
   4 mwNoDup-2000  2000   19   19   750   48%  2015    0% 





On Thu, 2008-10-30 at 14:59 -0200, Mark Boon wrote:
> Funny, I have been playing with something very similar. Although I  
> got side-tracked to something else for the moment. Intuitively I felt  
> tanh() was more appropriate than a linear function. Although you may  
> want to have the inverse of that, as I was trying to calculate the  
> territory certainty whereass you want the territory uncertainty.
> 
>       Mark
> 
> On 30-okt-08, at 14:21, Don Dailey wrote:
> 
> > Reference bot enhancement
> > =========================
> >
> > Here is another possible enhancement to the reference bot which I am
> > currently testing.  I do not yet have anything conclusive enough to
> > report, but it looks good so far with a small number of games.
> >
> > But even if this idea doesn't pan out, it will produce a much more
> > natural playing style without weakening the bot.
> >
> > Here is how it works.  We will use 1000 playouts for our example:
> >
> > 1. Modify the bot to keep a "futures" table.  At the end of each
> >    playout, tally the wins for white and black for each point on the
> >    board.  (I tally -1 for a white win, 1 for a black win to get a
> >    final score from -1000 to 1000 for each point.)
> >
> > 2. When the 1000 playouts are complete, compute an "uncertainty value"
> >    for each point, where 1.0 is completely uncertain, and 0.0 is
> >    completely certain.  A point is completely certain if at the end of
> >    each playout it was ALWAYS owned by one player or the other.  It's
> >    completely uncertain if it won 50% of the time for either side.
> >
> > 3. When determining which move to play, apply an uncertainty delta to
> >    the computed score of each move.  This is simply some fraction of
> >    the "uncertainty value" and the best value I've tested so far is
> >    0.025.  So you get a bonus that ranges from 0.0 to 0.025.
> >
> > 4. Choose the move with the best (sc + uncertainty_delta.)
> >
> > 5. The incentive must be small, large incentives will destroy the
> >    playing strength.  For instance 0.1 is too high and weakens it.
> >    The value that is testing the best for me (of the ones I've tried
> >    so far) is 0.025
> >
> > 6. This may test at some levels better than others.  I'm testing
> >    at 2000 playouts.
> >
> > The idea is to gently encourage the bot to avoid playing to points
> > that are clearly a forgone conclusion (or conversely, encourage it to
> > play where the "action" is.)
> >
> > This should make the bot play much less artificially.  Near the end of
> > the game it will prefer moves to unresolved points.  Earlier in the
> > game it will avoid moving to areas that are "probably" already won or
> > lost.
> >
> > My feeling is that these "incentives" should probably be calculated in
> > a non-linear way, but what I described is a good starting point.  From
> > experiments in the past it seems more important to put the focus and
> > most of the weight on avoiding play to highly certain points.   So I
> > will try some non-linear formula next.
> >
> >
> > - Don
> >
> > _______________________________________________
> > computer-go mailing list
> > [email protected]
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> 
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to