Mark Boon wrote:
> I'm fairly new on the subject of Monte Carlo and am in the process of
> catching up on reading, so I hope you guys have some patience with me
> while I get into this and ask a lot of questions. I got side-tracked
> away from computer-Go programming for quite a while for various
> reasons but have been at it again full-time recently.
Good!     You are one of the best.
>
> Let me start by saying I haven't followed the whole Monte-Carlo / UCT
> story all that well. Although it was clear it did well on 9x9 I wasn't
> all that convinced it would scale up to 19x19. That it did well on 9x9
> didn't surprise me much as I've always felt 9x9 is susceptible to see
> strong play using some extreme brute-force method using today's
> computing power.
The surprise is that random play-outs without tree search actually plays
a reasonable (certainly not strong) game.   At least much better than
the toy programs.   I think this single fact was just enough to keep
people interested enough to try more sophisticated things like you
mention next.
>
> Combining Monte-Carlo methods with what's generally called "heavy
> playouts" changes things a lot however and this is where it piqued my
> interest. I often see reference to Monte-Carlo programs as opposed to
> "traditional" Go programs. But what I'm seeing with the heavy playouts
> is "deja-vu all over again". It seems to me the evolution of the heavy
> playouts is following a similar course of that of the first rule-based
> Go programs in the early 1980's. To use another famous quote "history
> doesn't repeat itself but it rhymes."
I see it the same way.   I have noticed this in other fields, at some
point the different methods start to actually converge - all borrowing
ideas from things that are known to work.

>
> So I wouldn't be surprised at all if at some point you'll see a
> marriage of the best ideas of traditional Go programs and Monte-Carlo
> / UCT. In fact, this is most likely already happening as these
> Monte-Carlo programs use algorithms / ideas from the traditional
> programs for tactics, pattern-matching and possibly others.. And I go
> out on a limb to predict that at some point "heavy playouts" will use
> information like territory, eye-space and group-strength to guide
> their playouts just like the "traditional" programs do. And that using
> fixed 3x3 patterns will be a passing fad.
I believe there is a knowledge bottleneck that cannot easily be passed
through.    Even with patterns there is only so far you can go with 3x3
patterns for instance.    I agree with you that play-outs will probably
continue to get thicker and thicker until the point of diminishing
returns makes it counter-productive.     There comes a point where it
takes an incredible amount of effort to add a little bit of raw
knowledge that will actually translate to a stronger program.   

Of course this goes back to the argument of search vs knowledge.   One
can always be bought with the others.   Knowledge is so useful that a
piece of it can save enormous search effort.   But the converse is
equally true - some things are so difficult to discover statically that
you need to use imagination which is what I consider the closest A.I.
analogy to search -  i.e. search IS imagination,  we do it when we plan
our day, considering what we will do and how long it will take, who we
might talk to and what we will say.    Go program do not have this if
they don't do use imagination (seeing in their "minds" what will happen
and imagining how they will response, having a dialogue with themselves
so to speak via search.)

I think what has made tree search combined with monte carlo so
successful is that is so dynamic.   Traditional programs have little
coping power - they either have the answer (in a pattern or a routine)
or they don't.     Now of course it turns out that you can still help
them along signficantly by letting them have imagination,   but also
forcing them to be realistic with some knowledge so they don't go out of
control!


>
> Anyway, as it is I have made a few initial forrays into Monte-Carlo
> algorithms. I hereby thank Peter Drake on his work on Orego, which is
> a fine package to learn about Monte-Carlo. I used it to make my own
> implementation. Not that there was much wrong with Orego but the best
> way to understand and internalize a concept is to actually implement
> it. Moreover I felt I needed a framework that is a bit more
> generalized than Orego. Lastly I was convinced of course that I'd be
> able to make it faster. That last part didn't turn out to be true
> though. On my iMac, which is a 2Ghz Core Duo, I get a little under 30K
> playouts on 9x9. A little over 50K when using both cores. Orego is in
> the same ball-park. At least making a more generalized implementation
> didn't hurt performance so I'm OK with this first attempt. I think
> there's still room for optimizations, if I thought it was important at
> this stage.
>
> This comes to my first point. Optimizing early in a project is like
> listening to the devil. 
I recently read an interesting blog on this,  where it was claimed that
early optimization SHOULD be done when performance is actually a
consideration (and sometimes it isn't.)     The idea is that if ignore
performance consideration early,  you basically box yourself into a
corner and suddenly it's very painful to fix a performance
problem.       The blog seemed fairly balanced to me,  he didn't claim
that you should go to herculean efforts to obfuscate your code with
optimizations but that you should always be thinking about performance, 
especially early on when you can do something about it.      So I
understand the classical wisdom about early optimization but I believe
there needs to be some balance and I'm open minded on the subject. 

> It eats up a lot of time, the visible progress is gratifying but in
> the grand scheme of things it's not all that important to do early. I
> implemented it using pseudo-liberties because... uh, well, because
> that's what everyone seemed to be doing to get high numbers of
> playouts per second. But I already start to doubt using
> pseudo-liberties are all that useful. Is anybody still using
> pure-random playouts for anything? As soon as you start to do anything
> more, pseudo-liberties are pretty useless. Yes, using real liberties
> slows things down a lot. But the speed of the random playout becoms
> less and less important with heavy playouts.
I agree.  I also implemented psuedo liberties and I regret that
decision.   It's added state and code to my program and it's no longer
as clean as it used to be.     It definitely is an advantage for light
play-outs but my best program doesn't do light play-outs and have little
to show for pseduo liberties.

I am a believer in re-writes.   There is no way to know how a program
will turn out when you are using it for research and development,   but
at some point you get a pretty good sense of what you need and my
solution is to re-write the program from scratch!    You are forced to
revisit and reconsider what you have done.    You should be able to make
a clone of that program that runs much faster and if you can't it means
you don't really understand your program and you NEED to take it apart
and put it back together.    I have found all kinds of bugs and
misconception in my programs using that process.     With a simple go
program,  this is not a daunting task,  but I can image that rewriting a
program that has been under development for years and that is knowledge
intensive would be pretty time consuming.      Still,  I would advocate
doing it.


>
> Next I was thinking about another subject that got some attention on
> this list, the mercy rule. It seems to save about 10% in the number of
> moves per game (on 19x19) and result in about 20% gain in performance.
> This discrepancy is most likely due to the fact that the
> administration, whether using pseudo-liberties or real, is much slower
> towards the end of the game because you have more moves that merge
> chains. And those 10% moves it saves are of course always at the end.
> So is it relevant? I don't know whether heavy playouts will be slower
> towards the end of the game or not. Possibly yes, as more moves made
> will have a small number of liberties that will need tactical
> analysis. I'd say that generally reducing the move-count is a good
> thing whichever method one uses. Possibly at a later stage more
> sophisticated methods can be developed to abort a game early.
I think it should be a gain - but I want to revisit this subject.   It
ought to be helpful but it's very difficult to accurately test playing
strength changes.    If I gain 25% speed,  but it hurts the quality of
the search somewhat,   how do I measure if it was worth it?     The only
feasible solution is to play several thousand games if the expected
improvement is fairly small.    If the improvement is really a free 25%
then you can probably discover this with reasonable confidence in a few
hundred games.

A method I sometime use is to self-test a speed performance improvement
change at fixed node level.   If the speedup is "free" and doesn't cost
playing strength,  you should score very close to 50% if you keep the
fixed level the same (not considering time.)     But if it isn't,  you
can extrapolate but this is subject to error.


>
> Lastly (for this post anyway) I looked at the multiple-suicide
> question. Whether allowing it or not seems to have negligable effect
> on performance. What I did first was to tune the komi until the
> win-ratio for Black and White is as close to 50% as possible using
> random playouts. The number ended up being 2. Next I allowed suicide
> only for White. This pushed Black's winning percentage up to 65%. I
> think this pretty much settles the matter and it's clear allowing
> suicide is very detrimental to quality of play in a Monte-Carlo program.
>
> That's all for today :)
>
> Mark
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to