Re: [computer-go] A new reference bot enhancement to try

Don Dailey Thu, 30 Oct 2008 10:11:55 -0700

I may have been wrong when I said 0.1 was too high,  it's the value that
is now testing best and it's the highest value I am testing.   It is
showing 61 ELO  improvement over not using the idea at all.   I have
only played about 160 games, so there is still a lot of statistical
noise here and anything can happen.


When I have checked this out good,  I'll experiment with tanh()


- Don





On Thu, 2008-10-30 at 14:59 -0200, Mark Boon wrote:
> Funny, I have been playing with something very similar. Although I  
> got side-tracked to something else for the moment. Intuitively I felt  
> tanh() was more appropriate than a linear function. Although you may  
> want to have the inverse of that, as I was trying to calculate the  
> territory certainty whereass you want the territory uncertainty.
> 
>       Mark
> 
> On 30-okt-08, at 14:21, Don Dailey wrote:
> 
> > Reference bot enhancement
> > =========================
> >
> > Here is another possible enhancement to the reference bot which I am
> > currently testing.  I do not yet have anything conclusive enough to
> > report, but it looks good so far with a small number of games.
> >
> > But even if this idea doesn't pan out, it will produce a much more
> > natural playing style without weakening the bot.
> >
> > Here is how it works.  We will use 1000 playouts for our example:
> >
> > 1. Modify the bot to keep a "futures" table.  At the end of each
> >    playout, tally the wins for white and black for each point on the
> >    board.  (I tally -1 for a white win, 1 for a black win to get a
> >    final score from -1000 to 1000 for each point.)
> >
> > 2. When the 1000 playouts are complete, compute an "uncertainty value"
> >    for each point, where 1.0 is completely uncertain, and 0.0 is
> >    completely certain.  A point is completely certain if at the end of
> >    each playout it was ALWAYS owned by one player or the other.  It's
> >    completely uncertain if it won 50% of the time for either side.
> >
> > 3. When determining which move to play, apply an uncertainty delta to
> >    the computed score of each move.  This is simply some fraction of
> >    the "uncertainty value" and the best value I've tested so far is
> >    0.025.  So you get a bonus that ranges from 0.0 to 0.025.
> >
> > 4. Choose the move with the best (sc + uncertainty_delta.)
> >
> > 5. The incentive must be small, large incentives will destroy the
> >    playing strength.  For instance 0.1 is too high and weakens it.
> >    The value that is testing the best for me (of the ones I've tried
> >    so far) is 0.025
> >
> > 6. This may test at some levels better than others.  I'm testing
> >    at 2000 playouts.
> >
> > The idea is to gently encourage the bot to avoid playing to points
> > that are clearly a forgone conclusion (or conversely, encourage it to
> > play where the "action" is.)
> >
> > This should make the bot play much less artificially.  Near the end of
> > the game it will prefer moves to unresolved points.  Earlier in the
> > game it will avoid moving to areas that are "probably" already won or
> > lost.
> >
> > My feeling is that these "incentives" should probably be calculated in
> > a non-linear way, but what I described is a good starting point.  From
> > experiments in the past it seems more important to put the focus and
> > most of the weight on avoiding play to highly certain points.   So I
> > will try some non-linear formula next.
> >
> >
> > - Don
> >
> > _______________________________________________
> > computer-go mailing list
> > [email protected]
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> 
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/

signature.asc
Description: This is a digitally signed message part

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] A new reference bot enhancement to try

Reply via email to