Re: [Computer-go] MCTS and perfect endgame

Hideki Kato Mon, 04 Jul 2011 18:12:35 -0700

Jonathan Chetwynd: <[email protected]>:
>hideki wrote This is obviously wrong in handcap games,
>but what else is there?
>
>to start with the perhaps obvious,
>I believe komi is raised in pachi and tailed off as the game progresses.
>ie the goal is high to start and lowers as the game progresses.
>
>when playing as white in the opening the goal is to survive, maintain  
>sente where possible, and leave options open.
>in this sense one plays moves with complex indeterminate high risk  
>outcomes,
>which is perhaps similar to high komi
>
>similarly black probably plays over territorial moves trying to secure  
>and thereby reduce risk.
>
>we cant hypothecate what a better player might play, can we?


I believe we can.  A natural way (and I'm thinking about) is to make 
more precise model of opponents in simulations.  For White, i.e., the 
opponent is supposed weaker, adding some randomness to the move 
selection might be effective, for example, though AFAIK no one has 
evaluated yet.  For Black, which should be more important, we have only 
dynamic komi technique, which is empirically proved effective.  More 
safety and/or sound ways, however, based on some _logical and/or 
deeper thought_ could be possible.  This is the reason I've asked.
#I've implemented dynamic komi years ago with my intution that MCTS 
performs best at even positions.

Best,
Hideki

>regards
>
>Jonathan
>
>On 4 Jul 2011, at 17:58, Hideki Kato wrote:
>
>> Interesting thoughts and I have a question.
>>
>> How about handicap games?  The opponent used in the simulations is  
>> self
>> in most (all?) MCTS programs.  This is obviously wrong in handcap  
>> games
>> and the evaluation function returns wrong estimations of scores and
>> winning rates.  So, the question is how to maximize winning chances in
>> such games.
>>
>> Hideki
>>
>> Ben Shoemaker: <[email protected] 
>> >:
>>>> From: terry mcintyre <[email protected]>
>>>> "The major one is that the MCTS scoring function is imperfect;  
>>>> historically, programs have
>>> snatched defeat from the jaws of victory by letting points be  
>>> nibbled away in yose."
>>>
>>> (Apologies to those who understand go and computer-go better than  
>>> me--these are just my
>>> thoughts on the discussion.)
>>>
>>> There are several elements within this debate of "play to maximize  
>>> wins" versus "play to
>>> maximize points":
>>> 1) What strategy is perfect play?
>>> 2) What strategy is strongest with MCTS
>>>
>>> 3) What strategy is closest to human play
>>> 4) Would a combination of strategies be stronger than either alone?
>>>
>>> Let's examine these elements further:
>>>
>>> 1) According to the rules of Go, the winner is the player with the  
>>> highest score, but a win
>>> is equivalent to any other win--winning by 0.5 points is enough.   
>>> So perfect play would
>>> maximize wins but not necessarily points.
>>>
>>> However, the winner is determined by points, so an accurate count  
>>> of points (evaluation) is
>>> necessary to determine the winner.  At the end of the game, this is  
>>> trivial.  Earlier in the
>>> game this is harder.  A perfect evaluation function would lead to  
>>> perfect play--only winning
>>> moves would be played.  Most current go programs seem to use the  
>>> "play to maximize wins"
>>> strategy but so far none can play perfectly so we can say that  
>>> their evaluation functions are
>>> not perfect.  With a perfect evaluation function, the "play to  
>>> maximize points" strategy
>>> should also lead to perfect play.
>>>
>>> 2) Many go program authors have stated that "play to maximize wins"  
>>> is stronger than "play to
>>> maximize points".  I think this is because their evaluation  
>>> functions are imperfectly
>>> optimistic--the program counts points that future play does not  
>>> deliver.  Depending on the
>>> margin of error in the score estimation, this can turn a win into a  
>>> loss.  By focusing on
>>> wins rather than points, current programs minimize the effect of  
>>> the "optimistic score
>>> estimation" problem.
>>>
>>> 3) Humans seem to play with a combination of the two strategies-- 
>>> and every human might use a
>>> different combination.  Seeing all the way through a game to the  
>>> end score is difficult from
>>> the beginning of the game, so we analyze "local" situations for  
>>> their point values and
>>> combine the local situations to approximate the global situation.   
>>> As the game progresses,
>>> the score estimation becomes more accurate and human players adjust  
>>> their strategy according
>>> to the margin of error.  If they are way behind, they play very  
>>> aggressively or resign.  If
>>> they are slightly behind, they play slightly aggressively to catch  
>>> up.  If they are slightly
>>> ahead, they play safely to secure the win.  If they are way ahead,  
>>> they play very safely or
>>> pass to prompt their opponent to resign.  While "playing human-like  
>>> moves" is a separate goal
>>> from "playing to maxmize wins" that does not mean that anything  
>>> other than pure "playing to
>>> maxmize wins" WILL make any given program
>>> weaker and only serves the goal of "playing human-like moves".   
>>> Even if no-one has yet found
>>> such an improvement it certainly could exist in theory.
>>>
>>> 4) Until a perfect evaluation function is implemented, programmers  
>>> will wonder (and
>>> experimentally test) if the "play to maximize wins" is optimal for  
>>> their imperfect evaluation
>>> function.  So far, it seems to be the strongest strategy, but  
>>> current programs do have known
>>> deficiencies, and there is no proof that a combination of  
>>> strategies would always be
>>> weaker--especially since that might differ for each individual  
>>> evaluation function.
>>>
>>> The obvious way to improve the strength of a go program is to  
>>> improve the evaluation function
>>> (easier said than done).  Classical programs used hard-coded go  
>>> knowledge and it was
>>> surprising when MCTS programs surpassed them with very little go  
>>> knowledge and clearly
>>> imperfect evaluation.  As program authors have found a way to  
>>> balance the speed and accuracy
>>> of "heavy" playouts, the MCTS programs have improved further.   
>>> Beside improving the
>>> evaluation function, there may be improvements in strategy that  
>>> would help an imperfect
>>> program play stronger.
>>>
>>> Ben Shoemaker.
>>> _______________________________________________
>>> Computer-go mailing list
>>> [email protected]
>>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> -- 
>> Hideki Kato <mailto:[email protected]>
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
>_______________________________________________
>Computer-go mailing list
>[email protected]
>http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
-- 
Hideki Kato <mailto:[email protected]>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] MCTS and perfect endgame

Reply via email to