>From: terry mcintyre <[email protected]>
>"The major one is that the MCTS scoring function is imperfect; historically, 
>programs have snatched defeat from the jaws of victory by letting points be 
>nibbled away in yose."

(Apologies to those who understand go and computer-go better than me--these are 
just my thoughts on the discussion.)

There are several elements within this debate of "play to maximize wins" versus 
"play to maximize points":
1) What strategy is perfect play?
2) What strategy is strongest with MCTS

3) What strategy is closest to human play
4) Would a combination of strategies be stronger than either alone?

Let's examine these elements further:

1) According to the rules of Go, the winner is the player with the highest 
score, but a win is equivalent to any other win--winning by 0.5 points is 
enough.  So perfect play would maximize wins but not necessarily points.

However, the winner is determined by points, so an accurate count of points 
(evaluation) is necessary to determine the winner.  At the end of the game, 
this is trivial.  Earlier in the game this is harder.  A perfect evaluation 
function would lead to perfect play--only winning moves would be played.  Most 
current go programs seem to use the "play to maximize wins" strategy but so far 
none can play perfectly so we can say that their evaluation functions are not 
perfect.  With a perfect evaluation function, the "play to maximize points" 
strategy should also lead to perfect play.

2) Many go program authors have stated that "play to maximize wins" is stronger 
than "play to maximize points".  I think this is because their evaluation 
functions are imperfectly optimistic--the program counts points that future 
play does not deliver.  Depending on the margin of error in the score 
estimation, this can turn a win into a loss.  By focusing on wins rather than 
points, current programs minimize the effect of the "optimistic score 
estimation" problem.

3) Humans seem to play with a combination of the two strategies--and every 
human might use a different combination.  Seeing all the way through a game to 
the end score is difficult from the beginning of the game, so we analyze 
"local" situations for their point values and combine the local situations to 
approximate the global situation.  As the game progresses, the score estimation 
becomes more accurate and human players adjust their strategy according to the 
margin of error.  If they are way behind, they play very aggressively or 
resign.  If they are slightly behind, they play slightly aggressively to catch 
up.  If they are slightly ahead, they play safely to secure the win.  If they 
are way ahead, they play very safely or pass to prompt their opponent to 
resign.  While "playing human-like moves" is a separate goal from "playing to 
maxmize wins" that does not mean that anything other than pure "playing to 
maxmize wins" WILL make any given program
 weaker and only serves the goal of "playing human-like moves".  Even if no-one 
has yet found such an improvement it certainly could exist in theory. 

4) Until a perfect evaluation function is implemented, programmers will 
wonder (and experimentally test) if the "play to maximize wins" is optimal for 
their imperfect evaluation function.  So far, it seems to be the strongest 
strategy, but current programs do have known deficiencies, and there is no 
proof that a combination of strategies would always be weaker--especially since 
that might differ for each individual evaluation function.

The obvious way to improve the strength of a go program is to improve the 
evaluation function (easier said than done).  Classical programs used 
hard-coded go knowledge and it was surprising when MCTS programs surpassed them 
with very little go knowledge and clearly imperfect evaluation.  As program 
authors have found a way to balance the speed and accuracy of "heavy" playouts, 
the MCTS programs have improved further.  Beside improving the evaluation 
function, there may be improvements in strategy that would help an imperfect 
program play stronger.

Ben Shoemaker.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to