Interesting. I had tried something much more simple. I added two wins
for the first move in the sequence, figuring that a move being first
in the sequence should have more weight than the rest. But to my
surprise that played much worse, winning only 37%. Maybe I made a
mistake and I should try again.
Mark
On 27-okt-08, at 15:01, Michael Williams wrote:
The following modification to AMAF seems to perform better and
scale better. The idea is to weight the moves at the beginning of
the playout heavier than the moves at the end of the playout. It's
probably not a new idea.
This code from the reference implementation:
wins[mv] += sc;
hits[mv]++;
Becomes this:
double weight = 1.0 - (double)(i - savctm) / (ctm - savctm);
wins[mv] += weight * sc;
hits[mv] += weight;
If you are not familiar with the reference code, here are the
meanings of the variables in the code above:
i is the loop variable, counting from savctm to ctm
mv iterates over each move in the playout
sc is 1 or -1, depending on the outcome of the playout
ctm is the move count at the end of the playout
savctm is the move count at the beginning of the playout
hits is the number of times a given move was played
wins is the number of times a given move resulted in a playout win
At 15 playouts per move, the modified version wins 54.0% of the
time (±3.5%) after 200 games.
At 30 playouts per move, the modified version wins 54.0% of the
time (±3.5%) after 200 games.
At 60 playouts per move, the modified version wins 54.5% of the
time (±3.5%) after 200 games.
At 125 playouts per move, the modified version wins 53.0% of the
time (±3.5%) after 200 games.
At 250 playouts per move, the modified version wins 54.0% of the
time (±3.5%) after 200 games.
At 500 playouts per move, the modified version wins 55.5% of the
time (±3.5%) after 200 games.
At 1000 playouts per move, the modified version wins 57.5% of the
time (±3.5%) after 200 games.
At 2000 playouts per move, the modified version wins 63.0% of the
time (±3.4%) after 200 games.
At 4000 playouts per move, the modified version wins 63.5% of the
time (±3.4%) after 200 games.
At 8000 playouts per move, the modified version wins 63.5% of the
time (±3.4%) after 200 games.
At 16000 playouts per move, the modified version wins 71.0% of the
time (±3.2%) after 200 games.
Because of the weighting, it is probably safe to remove the code
that checks to see if the move was previously played before
awarding credit. Doing so and incrementally calculating the weight
would yield this simple and fast update loop after each playout:
// Track win statistics using weighted AMAF - (All Moves As
First)
//
---------------------------------------------------------------
double weight = 1.0;
double weightDelta = 2.0 / (ctm - savctm + 1);
for (int i = savctm; i < ctm; i += 2)
{
int mv = mvs[i] & MASK;
wins[mv] += weight * sc;
hits[mv] += weight;
weight -= weightDelta;
}
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/