I'm fairly new on the subject of Monte Carlo and am in the process of catching up on reading, so I hope you guys have some patience with me while I get into this and ask a lot of questions. I got side-tracked away from computer-Go programming for quite a while for various reasons but have been at it again full-time recently.

Let me start by saying I haven't followed the whole Monte-Carlo / UCT story all that well. Although it was clear it did well on 9x9 I wasn't all that convinced it would scale up to 19x19. That it did well on 9x9 didn't surprise me much as I've always felt 9x9 is susceptible to see strong play using some extreme brute-force method using today's computing power.

Combining Monte-Carlo methods with what's generally called "heavy playouts" changes things a lot however and this is where it piqued my interest. I often see reference to Monte-Carlo programs as opposed to "traditional" Go programs. But what I'm seeing with the heavy playouts is "deja-vu all over again". It seems to me the evolution of the heavy playouts is following a similar course of that of the first rule-based Go programs in the early 1980's. To use another famous quote "history doesn't repeat itself but it rhymes."

So I wouldn't be surprised at all if at some point you'll see a marriage of the best ideas of traditional Go programs and Monte- Carlo / UCT. In fact, this is most likely already happening as these Monte-Carlo programs use algorithms / ideas from the traditional programs for tactics, pattern-matching and possibly others.. And I go out on a limb to predict that at some point "heavy playouts" will use information like territory, eye-space and group-strength to guide their playouts just like the "traditional" programs do. And that using fixed 3x3 patterns will be a passing fad.

Anyway, as it is I have made a few initial forrays into Monte-Carlo algorithms. I hereby thank Peter Drake on his work on Orego, which is a fine package to learn about Monte-Carlo. I used it to make my own implementation. Not that there was much wrong with Orego but the best way to understand and internalize a concept is to actually implement it. Moreover I felt I needed a framework that is a bit more generalized than Orego. Lastly I was convinced of course that I'd be able to make it faster. That last part didn't turn out to be true though. On my iMac, which is a 2Ghz Core Duo, I get a little under 30K playouts on 9x9. A little over 50K when using both cores. Orego is in the same ball-park. At least making a more generalized implementation didn't hurt performance so I'm OK with this first attempt. I think there's still room for optimizations, if I thought it was important at this stage.

This comes to my first point. Optimizing early in a project is like listening to the devil. It eats up a lot of time, the visible progress is gratifying but in the grand scheme of things it's not all that important to do early. I implemented it using pseudo-liberties because... uh, well, because that's what everyone seemed to be doing to get high numbers of playouts per second. But I already start to doubt using pseudo-liberties are all that useful. Is anybody still using pure-random playouts for anything? As soon as you start to do anything more, pseudo-liberties are pretty useless. Yes, using real liberties slows things down a lot. But the speed of the random playout becoms less and less important with heavy playouts.

Next I was thinking about another subject that got some attention on this list, the mercy rule. It seems to save about 10% in the number of moves per game (on 19x19) and result in about 20% gain in performance. This discrepancy is most likely due to the fact that the administration, whether using pseudo-liberties or real, is much slower towards the end of the game because you have more moves that merge chains. And those 10% moves it saves are of course always at the end. So is it relevant? I don't know whether heavy playouts will be slower towards the end of the game or not. Possibly yes, as more moves made will have a small number of liberties that will need tactical analysis. I'd say that generally reducing the move-count is a good thing whichever method one uses. Possibly at a later stage more sophisticated methods can be developed to abort a game early.

Lastly (for this post anyway) I looked at the multiple-suicide question. Whether allowing it or not seems to have negligable effect on performance. What I did first was to tune the komi until the win- ratio for Black and White is as close to 50% as possible using random playouts. The number ended up being 2. Next I allowed suicide only for White. This pushed Black's winning percentage up to 65%. I think this pretty much settles the matter and it's clear allowing suicide is very detrimental to quality of play in a Monte-Carlo program.

That's all for today :)

Mark

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to