On Mon, 2008-10-27 at 14:48 -0400, Michael Williams wrote:
> It's actually more bare-bones (in the sense that there is less code) if you 
> consider the update loop that I showed.

The spirit of the recipe is to avoid any magic, any special cleverness,
anything that requires extra explanation, etc.   For instance one thing
I hate is the magic constant of 3X for the stopping rule - but some
things are absolutely necessary. 

I could have specified early cutoffs for AMAF because I have found it's
better not to go all the way to the end for this,  but again that is an
enhancement that goes beyond the most basic thing and would require
additional explanation.    This idea of course supplants that, but it's
easy to imagine that someone else might come along and show a different
decay function that is more effective than yours and so on.    

In fact, the number of "simple" enhancements could go on indefinitely.
AMAF would even seem to violate this basic principle of simplicity,
however AMAF has become so ubiquitous that it seems right to make it
part of this.

So no, the idea isn't to make the strongest reference bot possible but
to provide a definition so simple and easy to explain that people will
be willing to implement it.    I don't want to add several paragraphs to
explain each little tweak.

However, having said all of that,  I like the idea of tracking these
kinds of enhancements.   We might even have progressively stronger
version that we could "standardize" and test against so that new bot
authors could progressively add features (such as this nice enhancement)
as they learn.

- Don




> 
> Don Dailey wrote:
> > Great information.   I'll include a version of this to my scalability
> > study.   Is this the C version?   
> > 
> > wins[] and hits[] are integer arrays and weight is a fraction less than
> > 1.0, so I'm not sure how this works.  Did you change hits and wins to be
> > doubles?
> > 
> > There are many enhancements possible, and I'm not going to change the
> > definition,  it was my intention to provide a very basic "bare-bones"
> > reference implementation for others to build on.
> > 
> > - Don
> > 
> > 
> > 
> > On Mon, 2008-10-27 at 13:01 -0400, Michael Williams wrote:
> >> The following modification to AMAF seems to perform better and scale 
> >> better.  The idea is to weight the moves at the beginning of the playout 
> >> heavier than the 
> >> moves at the end of the playout.  It's probably not a new idea.
> >>
> >> This code from the reference implementation:
> >>    wins[mv] += sc;
> >>    hits[mv]++;
> >>
> >> Becomes this:
> >>    double weight = 1.0 - (double)(i - savctm) / (ctm - savctm);
> >>    wins[mv] += weight * sc;
> >>    hits[mv] += weight;
> >>
> >> If you are not familiar with the reference code, here are the meanings of 
> >> the variables in the code above:
> >>    i is the loop variable, counting from savctm to ctm
> >>    mv iterates over each move in the playout
> >>    sc is 1 or -1, depending on the outcome of the playout
> >>    ctm is the move count at the end of the playout
> >>    savctm is the move count at the beginning of the playout
> >>    hits is the number of times a given move was played
> >>    wins is the number of times a given move resulted in a playout win
> >>
> >> At    15 playouts per move, the modified version wins 54.0% of the time 
> >> (±3.5%) after 200 games.
> >> At    30 playouts per move, the modified version wins 54.0% of the time 
> >> (±3.5%) after 200 games.
> >> At    60 playouts per move, the modified version wins 54.5% of the time 
> >> (±3.5%) after 200 games.
> >> At   125 playouts per move, the modified version wins 53.0% of the time 
> >> (±3.5%) after 200 games.
> >> At   250 playouts per move, the modified version wins 54.0% of the time 
> >> (±3.5%) after 200 games.
> >> At   500 playouts per move, the modified version wins 55.5% of the time 
> >> (±3.5%) after 200 games.
> >> At  1000 playouts per move, the modified version wins 57.5% of the time 
> >> (±3.5%) after 200 games.
> >> At  2000 playouts per move, the modified version wins 63.0% of the time 
> >> (±3.4%) after 200 games.
> >> At  4000 playouts per move, the modified version wins 63.5% of the time 
> >> (±3.4%) after 200 games.
> >> At  8000 playouts per move, the modified version wins 63.5% of the time 
> >> (±3.4%) after 200 games.
> >> At 16000 playouts per move, the modified version wins 71.0% of the time 
> >> (±3.2%) after 200 games.
> >>
> >> Because of the weighting, it is probably safe to remove the code that 
> >> checks to see if the move was previously played before awarding credit.  
> >> Doing so and 
> >> incrementally calculating the weight would yield this simple and fast 
> >> update loop after each playout:
> >>
> >>          // Track win statistics using weighted AMAF - (All Moves As First)
> >>          // ---------------------------------------------------------------
> >>          double weight = 1.0;
> >>          double weightDelta = 2.0 / (ctm - savctm + 1);
> >>          for (int i = savctm; i < ctm; i += 2)
> >>          {
> >>              int mv = mvs[i] & MASK;
> >>
> >>              wins[mv] += weight * sc;
> >>              hits[mv] += weight;
> >>              weight -= weightDelta;
> >>          }
> >>
> >> _______________________________________________
> >> computer-go mailing list
> >> [email protected]
> >> http://www.computer-go.org/mailman/listinfo/computer-go/
> >>
> >> ------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> computer-go mailing list
> >> [email protected]
> >> http://www.computer-go.org/mailman/listinfo/computer-go/
> 
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to