Re: [computer-go] Scoring - step function or sigmoid function?

Stefan Kaitschick Wed, 08 Jul 2009 04:36:02 -0700

Thinking about why... In a given board position moves can be grouped
into sets: the set of correct moves, the set of 1pt mistakes, 2pt
mistakes, etc. Let's assume each side has roughly the same number of
moves each in each of these groupings.

If black is winning by 0.5pt with perfect play, then mistakes by each
side balance out and we get a winning percentage of just over 50%. If he
is winning by 1.5pt then he has breathing space and can make an extra
mistake. Or in other words, at a certain move he can play any of the
moves in the "correct moves" set, or any of the moves in the "1pt
mistakes" set, and still win. So he wins more of the playouts. Say 55%.
If he is winning by 2.5pts then he can make one 2pt mistakes or two 1pt
mistakes (more than the opponent) and still win, so he wins more
playouts, 60% perhaps. And so on.

My conclusion was that the winning percentage is more than just an
estimate of how likely the player is to win. It is in fact a crude
estimator of the final score.

Going back to your original comment, when choosing between move A that
leads to a 0.5pt win, and move B that leads to a 100pt win, you should
be seeing move B has a higher winning percentage.

Darren

Point well taken.Winning positions tend to cluster and critical swing movesare rare, statistically speaking.If the position is more or less evenly balanced, the step function mightallready be very close to optimal because of this.But I would like to bring up a well known mc quirk: In handicap positions,or after one side scored a big success in an even game,bots play badly with both sides, until the position becomes closer again.The problem here is that every move is a win (or every move is a loss).On 9*9, its possible to beat a bot, giving it 2 stones, even when it's aclose contest on even with komi. All it needs is a single bot missread atthe moment the position becomes close(which it will, because the bot will be"lazy" until that point).So it would be desirable for the bot to make keeping the score advantagelarge an auxiliary goal.

This has been tried ofcourse, but without much success sofar.

And it seems that the main reason is that tinkering with the scoringfunction to achive this, tends to worsen play in competitive situations.I have an alternative suggestion: In handicap games, introduce a virtualkomi, that gets reduced to 0 as the game progresses.This would work for the bot on both sides: If the bot has b it will makeless lazy plays, if it has w, it will be less maniacal.For example, in a 4 stone 19*19 game, if the real starting advantage isabout 45 points, the bot could introduce an internal komi of about 30-35.The bot should be optimistic with b and pessimistic with w, but not to thepoint that every move evaluates to the same value, and move selectionbecomes a toss-up. Another way to look at this, is that humans that give ahandicap know that they can't usually catch up in one piece.And humans that take a handicap know that they can't give up their advantagetoo quickly.

Virtual komi encodes this simple knowledge.

During the course of the game this internal komi would ofcourse have to bereduced to 0.The proper criteria can only be found by experimentation, but the importantfactors will be how far the game has progressed, and what the win rate isfor the best move. If the bot becomes pessimistic with b it should lower theinternal komi more quickly.


One advantage of this approach is that it doesn't mess up even game play.

A more elaborate scheme would be to make a "komi search" before the realsearch - to find the best ratio of win rate to internal komi before makingthe normal move search with this komi. This could also be useful in evenplay after one side pulled ahead.


Stefan





_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Scoring - step function or sigmoid function?

Reply via email to