Don Dailey wrote:

Hideki Kato wrote:
I'd like to give here an example to make things clear.

The conditions are:
1) Using digitizing scheme that maps real score to [0,1] (or [-1,1]) so that the program cannot distinguish losing/winning by 0.5 or 10.5 pt at all. 2) Playouts include some foolish moves (usually with low but not zero probability), not to connect large groups in atari position for example, due to hold its randomness. 3) The position is at early endgame where there are no moves that gain greater than 2 pt, for example, in perfect play.
4) Black is behind by 0.5 pt.

The playouts may return winning but gambling move (perhaps with low probability) under above conditions, especialy in case of the number of playouts is small which is usually true on 19x19, and UCT will choose it.

The question is, which is better to keep 0.5 pt behind or to play gambling moves (here I mean such moves that B will lose many pts if W will answer correctly) with expecting W's (stupid) mistakes?

The assumption is that you suddenly cannot trust MC to do what it does
best even though you did for the entire game up until this point.   MC
of course will choose the "gambling" move.      The whole concept of MC
is to do what is most likely to produce a win.

Not entirely, no. The concept of MC is to do what has most lines leading
to a win, which is slightly different. There's obviously a strong
correlation, or MC wouldn't work at all, but I think it's dangerous to
assume that MC by definition plays the best move. For one thing it makes
it very hard to argue about how to improve MC programs, it creates lots
of noise of "don't do that, it will only make your program weaker".

We should think twice before asking it to choose the moves that produces
the more sure loss.    We are the ones that have a bias about this, not
the MC programs.

In addition to above, there is one more issue to consider. If the playout has a systematic error, nakade for example, it's not good to keep 0.5 pt ahead. Having more margin is clearly better.
I believe nakade is a strawman.    There are lots of things MC does
better and lot's of things it does less well.    You can always find
positions that are hard or easy for your program to solve, but it isn't
intrinsic to this issue.     I don't think you should weaken this
concept of playing for the best winning chances for the very few
positions where MC programs take longer to resolve the endgame and there
is a slight chance that it will win if it just happens to be enough to
cover the exact situation.   Because this is no solution - it is at best
a patch and would only work in some cases.

I don't think patching one thing at a time is such a bad way to write a
go program. Small steps, one at a time, and you suddenly have a much
stronger program. And again you're making the assumption that to deviate
from accurate MC means less winning chances. It might mean less winning
LINES, but the probability of a loss or win is entirely dependent of how
the opponent plays, which is (hopefully) never random. And this does not
mean you're doing opponent modeling, or - if you define opponent
modeling very loosely so it includes this - that what you're doing is bad.

If I could do something
that didn't hurt the program in other ways, but might help certain
positions once in a while, I would go for it.

I don't think you'll find ANY improvement to ANY non-trivial program
that doesn't, in some cases, make it play worse. What matters is how it
does in the average case.

I've been in game
programming a long time - if you have a problem with certain types of
positions you really want a pointed solution that has little or no
impact on other positions.   You don't want to be going back and forth
fixing things up but you want to solve the problem as correctly as
possible the first time.       I'll call this principle, "every solution
has a side effect" but this is a pretty bad side effect.    (I can't
tell you how many times I "fixed" something in my chess program with
some evaluation change only to find that I broke many other things at
the same time.)

A good understanding of _why_ your program works can help this a lot,
ensuring that you know how to fix a problem without causing bigger
problems elsewhere. I think part of the problem here is that few (if
any) people know _why_ MC works. Why does the cumulative result of
random play-outs correlate so strongly with the strength of the
position? In what ways does it NOT correlate, that can be fixed?


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to