Interesting. I have observed a problem in the current dynamic komi scheme especially when there is a big semeai/life-and-death:

Playout 1   B+0.5
Playout 2   B+0.5
Playout 3   B+0.5
Playout 4   B+0.5
Playout 5   B+0.5
Playout 6   B+0.5
Playout 7   B+0.5
Playout 8   B+0.5
Playout 9   B+0.5
Playout 10 W+8.5

B has 90% winning rate. How many points should we shift for dynamic komi?

Aja

-----原始郵件----- From: Zach Wegner
Sent: Monday, January 09, 2012 11:53 AM
To: computer-go@dvandva.org
Subject: Re: [Computer-go] replacing dynamic komi with a scoring function

On Mon, Jan 9, 2012 at 7:26 AM, Vlad Dumitrescu <vladd...@gmail.com> wrote:
Hi,

On Mon, Jan 9, 2012 at 13:17, Don Dailey <dailey....@gmail.com> wrote:
Summary:

I believe a more correct scoring function won't be based on how much you win
by OR how often you win but will incorporate some other more relevant
concept and it will be dynamic.    And it will not matter if the game is
a handicap game or otherwise because the scoring function will always be
relevant.   The goal will be to maximize your winning chances but it
will incorporate something more sophisticated that just counting how often
you win or how much you win by.

I hope I may interfere with something that Don's nice description
revealed to me. It feels rather obvious, but since nobody stated it
explicitly, maybe it's news for at least some people here.

MCTS is maximizing the chances of winning. These chances are largest
for a minimal score difference because this allows for making some
errors. Winning by the largest possible score has rather small chances
to happen because every move has to be perfect.

The curve describing the probability of ending the game with a certain
score is bell-shaped and MCTS explores the area beneath it, looking
for winning moves. With handicap, the disadvantaged side is getting
less samples explored, making it less likely to discover the really
good moves. Dynamic komi shifts the bell left or right in order to
equalize the sampling on both sides, but as mentioned it isn't dynamic
enough (the curve changes after each move) and also is actually using
a different shape for the curve than the real "handicap curve".

In theory, I think that the solution for keeping the same level of
play with handicap as without would be to make sure that the the
disadvantaged side gets just as many samples with or without handicap.
That is, use more playouts when playing with handicap. In practice,
this is probably prohibitive...

I wonder if it might be possible to estimate the shape of this curve
after each move and use that estimate to dynamically adjust the number
of playouts. One might have to use higher precision calculations, too,
so that the noise doesn't get too loud.

Does this make any sense? Has anyone tried something like this?

best regards,
Vlad
_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

This is similar to ideas I've had to solve this. Of course, I'm not a
go programmer, so it's really just idle curiosity for me when reading
this list. Perhaps an ideal scoring function is "highest median
score". You can do this in a slow/memory intensive way by storing a
histogram of final score in each node instead of wins/losses.
Wins/losses can then be computed by summing the histogram buckets
above/below the komi level. Computing the median is simply finding the
bucket where the sum crosses 50%. Other percentages could be used here
as well, though I'm not quite sure what effect this might have.

This solves the problem of getting far less than one bit of
information per playout when the win rate is very high or low, and
also supersedes dynamic komi--if you have the full histogram at any
node, you can compute the exact win rate for any arbitrary komi.

Using this in practice might be fairly difficult, and would probably
involve making the histogram "lossy" in some sense. I would be
interested if somebody here could run tests with this as a limit study
though--keep the number of nodes allocated and playouts per move
constant, and see the effect on strength.

Zach
_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to