I'm fairly new on the subject of Monte Carlo and am in the process of
catching up on reading, so I hope you guys have some patience with me
while I get into this and ask a lot of questions. I got side-tracked
away from computer-Go programming for quite a while for various
reasons but have been at it again full-time recently.
Let me start by saying I haven't followed the whole Monte-Carlo / UCT
story all that well. Although it was clear it did well on 9x9 I
wasn't all that convinced it would scale up to 19x19. That it did
well on 9x9 didn't surprise me much as I've always felt 9x9 is
susceptible to see strong play using some extreme brute-force method
using today's computing power.
Combining Monte-Carlo methods with what's generally called "heavy
playouts" changes things a lot however and this is where it piqued my
interest. I often see reference to Monte-Carlo programs as opposed to
"traditional" Go programs. But what I'm seeing with the heavy
playouts is "deja-vu all over again". It seems to me the evolution of
the heavy playouts is following a similar course of that of the first
rule-based Go programs in the early 1980's. To use another famous
quote "history doesn't repeat itself but it rhymes."
So I wouldn't be surprised at all if at some point you'll see a
marriage of the best ideas of traditional Go programs and Monte-
Carlo / UCT. In fact, this is most likely already happening as these
Monte-Carlo programs use algorithms / ideas from the traditional
programs for tactics, pattern-matching and possibly others.. And I go
out on a limb to predict that at some point "heavy playouts" will use
information like territory, eye-space and group-strength to guide
their playouts just like the "traditional" programs do. And that
using fixed 3x3 patterns will be a passing fad.
Anyway, as it is I have made a few initial forrays into Monte-Carlo
algorithms. I hereby thank Peter Drake on his work on Orego, which is
a fine package to learn about Monte-Carlo. I used it to make my own
implementation. Not that there was much wrong with Orego but the best
way to understand and internalize a concept is to actually implement
it. Moreover I felt I needed a framework that is a bit more
generalized than Orego. Lastly I was convinced of course that I'd be
able to make it faster. That last part didn't turn out to be true
though. On my iMac, which is a 2Ghz Core Duo, I get a little under
30K playouts on 9x9. A little over 50K when using both cores. Orego
is in the same ball-park. At least making a more generalized
implementation didn't hurt performance so I'm OK with this first
attempt. I think there's still room for optimizations, if I thought
it was important at this stage.
This comes to my first point. Optimizing early in a project is like
listening to the devil. It eats up a lot of time, the visible
progress is gratifying but in the grand scheme of things it's not all
that important to do early. I implemented it using pseudo-liberties
because... uh, well, because that's what everyone seemed to be doing
to get high numbers of playouts per second. But I already start to
doubt using pseudo-liberties are all that useful. Is anybody still
using pure-random playouts for anything? As soon as you start to do
anything more, pseudo-liberties are pretty useless. Yes, using real
liberties slows things down a lot. But the speed of the random
playout becoms less and less important with heavy playouts.
Next I was thinking about another subject that got some attention on
this list, the mercy rule. It seems to save about 10% in the number
of moves per game (on 19x19) and result in about 20% gain in
performance. This discrepancy is most likely due to the fact that the
administration, whether using pseudo-liberties or real, is much
slower towards the end of the game because you have more moves that
merge chains. And those 10% moves it saves are of course always at
the end. So is it relevant? I don't know whether heavy playouts will
be slower towards the end of the game or not. Possibly yes, as more
moves made will have a small number of liberties that will need
tactical analysis. I'd say that generally reducing the move-count is
a good thing whichever method one uses. Possibly at a later stage
more sophisticated methods can be developed to abort a game early.
Lastly (for this post anyway) I looked at the multiple-suicide
question. Whether allowing it or not seems to have negligable effect
on performance. What I did first was to tune the komi until the win-
ratio for Black and White is as close to 50% as possible using random
playouts. The number ended up being 2. Next I allowed suicide only
for White. This pushed Black's winning percentage up to 65%. I think
this pretty much settles the matter and it's clear allowing suicide
is very detrimental to quality of play in a Monte-Carlo program.
That's all for today :)
Mark
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/