Re: [computer-go] Goal-directedness of Monte-Carlo

jonas . kahn Tue, 09 Sep 2008 01:43:44 -0700

Part of the problems stem from that playouts are weak, and more
specifically notably weaker than the program itself.


To begin with, a consequence is that most areas of the board are less
clear than they should to playouts. This entails, I think, a preference
for probable points against sure points. To fix ideas, suppose you can really 
define a probability that a point is yours at the end, and that playouts give 
for that probability (initial probability - 0.5) * 0.8 + 0.5. Then sure
points suffer more. This might explain why go bots play cosmic style:
there are many not-so-sure points in the center.

This might be solved by local analysis integrated to MCTS, but there is
always the necessity of being random in ``a right way''.

About handicap games, another problem is the symmetry of playouts. If we
view them as simulating players, this is wrong. White is stronger. It is
logical to weaken black's playouts so that the initial position gives
about 50% (or 60%) victory, as someone suggested. We then get ``the right'' 
shift in evaluation all game long. This is tampering with the evaluation 
function, though, and probably weakening it. A possible (far-fetched) benefit 
of the approach, would be adapting style to the opponent: if we say a weaker 
human does not know where to play globally, suppress black's proximity 
heuristics when giving stones, for example.

Similarly, we could start with a false komi, and reduce it linearly to
zero by move 200 (the idea is: the stronger player gets slowly more
points). That would break with huge groups at stake. This might be used
as an upper bound for dynamic komi, though.

For dynamic komi, from a non-programmer point of view, the easiest seem
to keep a whole increasing array at each node, with the percentage of
wins recorded for each komi between say -50 and 50 (by steps of 5,
maybe), and back-propagate as usual. Then from simulation 1000 or so,
play at real komi if gain probability is between say 40 and 60 (or 55,
no need to be symmetric), and biggest (or smallest) komi such that gain
probability is still more than 60 if it more than 60 for real komi, or
less than 40 if it is less 40 for real komi. It keeps the program in the
right ``I am better or I am worse'' area. It does not lose any
simulation. Maybe it is hard for memory?

In any case, these adjustments may have a meaning during middlegame and
so, but I am almost sure that for strength, the computer must play with
real komi in the endgame. Even if the strange moves come rather in the
endgame. Notably the 40 and 60 above should shift to bigger bounds
during the game. (Notice that it also corresponds to better evaluation
function: there is less variability and a wider spectrum when we are in
yose than when we are in fuseki.)

Jonas

On Mon, 8 Sep 2008, Jason House wrote:

Actually your summary of what people do sounds exactly like what MC programsdo, except for one point...
MC programs don't differentiate moves by point value. They only look atwinning rate. It's extremely tough to differentiate the one move sequencewith 99.1% win rate when all other moves have a 99% win rate.
Without any other heuristics or local search to guide MC programs, their playseems reckless...
Sent from my iPhone

On Sep 8, 2008, at 5:45 PM, terry mcintyre <[EMAIL PROTECTED]> wrote:
Interesting analysis, Don.
Human players sometimes adhere to a simple policy: "rich men don't pickfights."
When one is objectively far ahead, one picks up the easy profits, andotherwise takes no risks. If moves A, B, and C are comparable risk-wise,one would prefer the more profitable of the lot.
On the other hand, when one is far behind, one takes risks.
Such a strategy appears to maximize wins, especially when one is uncertainabout the status.
Can that strategy be effectively translated to MC terms?
To approach the problem from another angle, strong amateur and professionalplayers have a consensus that some moves return maximal value, others areunsatisfying, and still others are risky. They seem to have a high level ofagreement about the value of low-risk moves; disputes arise for high-riskplays where the outcome is less certain.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Goal-directedness of Monte-Carlo

Reply via email to