Raymond Wold wrote:
> Don Dailey wrote:
>>
>> Hideki Kato wrote:
>>> I'd like to give here an example to make things clear.
>>>
>>> The conditions are:
>>> 1) Using digitizing scheme that maps real score to [0,1] (or [-1,1])
>>> so that the program cannot distinguish losing/winning by 0.5 or 10.5
>>> pt at all.
>>> 2) Playouts include some foolish moves (usually with low but not
>>> zero probability), not to connect large groups in atari position for
>>> example, due to hold its randomness.
>>> 3) The position is at early endgame where there are no moves that
>>> gain greater than 2 pt, for example, in perfect play.
>>> 4) Black is behind by 0.5 pt.
>>>
>>> The playouts may return winning but gambling move (perhaps with low
>>> probability) under above conditions, especialy in case of the number
>>> of playouts is small which is usually true on 19x19, and UCT will
>>> choose it.
>>>
>>> The question is, which is better to keep 0.5 pt behind or to play
>>> gambling moves (here I mean such moves that B will lose many pts if
>>> W will answer correctly) with expecting W's (stupid) mistakes?
>>>
>>>   
>> The assumption is that you suddenly cannot trust MC to do what it does
>> best even though you did for the entire game up until this point.   MC
>> of course will choose the "gambling" move.      The whole concept of MC
>> is to do what is most likely to produce a win.      
>
> Not entirely, no. The concept of MC is to do what has most lines leading
> to a win, which is slightly different. There's obviously a strong
> correlation, or MC wouldn't work at all, but I think it's dangerous to
> assume that MC by definition plays the best move. For one thing it makes
> it very hard to argue about how to improve MC programs, it creates lots
> of noise of "don't do that, it will only make your program weaker".
I'm not assuming that MC plays the best move.   The problem isn't the
assumptions I am making, but the assumptions others are making,  that
it's NOT playing the best move.    You want to apply a fix to all
positions without really knowing which positions are a problem.   Now if
you want to talk about noise,  you have just added more noise if you do
that.   

I can give you an example of why you must be extremely careful about
making unfounded assumptions:

My naive simple MC program would often fail to make a needed but small
capture move because of noise in the play-outs.    The move would be
seconds or third choice but not quite make it to the top - instead it
would play a speculative attack or something I considered rather
stupid.    

My "fix" was to assume that this was a general problem,  that it didn't
know what it was doing - after all,  I could see with my own eyes that
this was a problem.      So I applied a general incentive to encourage
it to make capture moves.   

Guess what?   That was a mistake.   The program was actually pretty good
at not being greedy about captures - most of the time the capture can
wait and you essentially lose a stone every time you play a greedy
capture.   But now when it needed to do something useful it preferred to
waste time making a capture that could wait indefinitely (the group was
dead, the capture could be made later when there were no important moves
left.)   

I backed the change out in a hurry and realized that I would never get
it perfect unless I applied a very specific fix.    I already knew that
from computer chess.   In one early program I had a bonus for rook on
the 7th rank, a good general principle.    We played one of the top US
grandmasters in a casual game once and the computer put the rook on the
7th rank in a pawn-less endgame where the kings were already
centralized.    The rook incentive got in the way of it finding the
right plan.   Although the rook on the 7th heuristic made the program
stronger in general,  it really wasn't implemented as well as it could
be.     Every good general principle in chess and I'm sure Go too,  if
taken too seriously, has a side effect and will cause problems and the
solution is to try to address things as specifically as you can within
reason.

In my opinion,  your proposed "fix" for this imagined problem is
equivalent to if I had decided to reverse the rook to 7th bonus and turn
it into a penalty because I saw it go wrong once.  

In this case, my "working assumption" would be that monte carlo programs
know what they are doing (much more often than not) when they play risky
moves in losing positions.     In other words, are you trying to fix
something that is not broken?  

I think there is a disconnect too between what appeals to the eye and
what actually works.   Most reasonable go players won't take risks when
the score is "close" because they think the game is about equal when it
probably isn't.    Near the end of the game with Chinese scoring,  the
chances are rarely close to even - you just think they are because you
think that because the territory is "about even" the game could go
either way.       When I look at the logs of Lazarus,  the games are
virtually over well before the game actually ends.    When the silly
moves start, the game was over long ago but observers, even fairly
strong ones,  seem to believe there is still a lot of play left.    I'm
a lousy go player and even I understand this concept.

You can see this really clearly if you have a strong program and play on
KGS with people observing the games.    When the score is close, even
the pretty good players are chatting back and forth, analyzing lines of
play,  making  predictions,  when the program is showing that there is
nothing interesting left.   But very mundane play just naturally leads
to the correct results in these cases.


>
>> We should think twice before asking it to choose the moves that produces
>> the more sure loss.    We are the ones that have a bias about this, not
>> the MC programs. 
>>
>>> In addition to above, there is one more issue to consider. If the
>>> playout has a systematic error, nakade for example, it's not good to
>>> keep 0.5 pt ahead.  Having more margin is clearly better.
>>>   
>> I believe nakade is a strawman.    There are lots of things MC does
>> better and lot's of things it does less well.    You can always find
>> positions that are hard or easy for your program to solve, but it isn't
>> intrinsic to this issue.     I don't think you should weaken this
>> concept of playing for the best winning chances for the very few
>> positions where MC programs take longer to resolve the endgame and there
>> is a slight chance that it will win if it just happens to be enough to
>> cover the exact situation.   Because this is no solution - it is at best
>> a patch and would only work in some cases.
>
> I don't think patching one thing at a time is such a bad way to write a
> go program. Small steps, one at a time, and you suddenly have a much
> stronger program. 
I definitely believe in incremental improvements,   but ad-hock quick
fixes is what I'm talking about.    If they are not well thought out, 
you have a program that will never get very far.

This is true of testing too.   I can't say enough about testing with
hundreds of games,  thousands if you are patient enough.    I just
cringe whenever someone claims an improvement based on a test match of
20 games.    Or even worse, they played a couple of games and they just
see that it plays better.    If you think this way, you are making
random changes to your program and about half of them are helping and
the other half are hurting.     It's not easy to test this thoroughly, 
but some common sense can help - some kinds of changes can be judged as
beneficial without as much testing, but in general it's a hard problem.

> And again you're making the assumption that to deviate
> from accurate MC means less winning chances. 
But who is making the assumptions here?   You are the one making the
assumption that MC scoring is making the wrong choices more often that
not and must be fixed.  

> It might mean less winning
> LINES, but the probability of a loss or win is entirely dependent of how
> the opponent plays, which is (hopefully) never random. And this does not
> mean you're doing opponent modeling, or - if you define opponent
> modeling very loosely so it includes this - that what you're doing is
> bad.
>
>> If I could do something
>> that didn't hurt the program in other ways, but might help certain
>> positions once in a while,  I would go for it. 
>
> I don't think you'll find ANY improvement to ANY non-trivial program
> that doesn't, in some cases, make it play worse. What matters is how it
> does in the average case.
In general I agree, but improvements are not cumulative.  If I make ten
20 ELO point improvements it rarely adds up to 200 ELO.    One change
not properly thought out will cancel out a future improvement and do
more damage than good even though at the time it's implemented might 
give you an improvement. 

Here is an example:   Let's assume, for the sake or argument,  that 
your "fix" would bring a slight improvement  due to some problem that
should be fixed a different way (for example nakade which I don't
believe, but serves as an example.)    In such a case, fixing nakade
might improve the program a little, but backing out the original fix inn
addition may improve the program a lot.   I hate that word "synergy", 
but it definitely applies here!


>
>> I've been in game
>> programming a long time - if you have a problem with certain types of
>> positions you really want a pointed solution that has little or no
>> impact on other positions.   You don't want to be going back and forth
>> fixing things up but you want to solve the problem as correctly as
>> possible the first time.       I'll call this principle, "every solution
>> has a side effect" but this is a pretty bad side effect.    (I can't
>> tell you how many times I "fixed" something in my chess program with
>> some evaluation change only to find that I broke many other things at
>> the same time.)
>
> A good understanding of _why_ your program works can help this a lot,
> ensuring that you know how to fix a problem without causing bigger
> problems elsewhere. I think part of the problem here is that few (if
> any) people know _why_ MC works. Why does the cumulative result of
> random play-outs correlate so strongly with the strength of the
> position? In what ways does it NOT correlate, that can be fixed?
Agreed.   The mogo team calls the play-outs a  "black art" because you
cannot predict what will work and what will make the program weaker.

- Don



>
>
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to