[computer-go] More UCT / Monte-Carlo questions

Mark Boon Tue, 05 Feb 2008 07:54:41 -0800

Although most of my time has been eaten up by implementing/improvingsome general framework parts I did get a chance to play a bit with asimple UCT search. Some things that I found puzzled me a bit and Ihoped someone had an explanation or similar experiences.

I implemented a very basic UCT / MC program first using pseudo-liberties. I figured this should be the base-line against which I cantest some ideas. To test if the program actually worked properly Ifirst let it play against Orego. The speed of my playouts are similarto Orego so I figured the level of play should be similar. (Iswitched off pondering and multiple-threading in Orego to get anapples-to-apples comparison.)

To my surprise my program seemed to be winning the majority of thegames (after a few dozen games). When looking at Orego's output Icouldn't help noticing that at the start of the game it prints muchsmaller numbers of 'runs' than my program, whereas by the end of thegame the numbers are similar. This may be the reason for my programperforming better. When I looked at the code of Orego I noticed thereare two main differences:

- It computes the UCT value in a completely different way. A commentin the code refers to a paper called "Modification of UCT withPatterns in Monte-Carlo Go". I haven't studied this yet, but whateverit does it apparently doesn't do wonders over the standard C * sqrt( (2*ln(N)) / (10*n) ) that I use.

- It only initialises the list of untried moves in the tree after anode had a minimum run-count of 81 (on 9x9). For the life of me Icouldn't figure out what the effect of this was or what it actuallydoes. I was wondering if this has an effect of what is counted as a'run' but I'm not sure.

Then I found a paragraph (4.2) in Remi Coulomn's paper about ELOraings in patterns. It briefly describes it as "As soon as a numberof simulations is equal to the number of points on the board, thisnode is promoted to internal node, and pruning is applied." I can'thelp feeling that the implementation in Orego is doing this. But Ican't figure out where it does any pruning or applying patterns ofany kind. Is there supposedly a general benefit to this even withoutpruning or patterns? As stated before, at least it doesn't seem toprovide any benefit over my more primitive implementation. MaybePeter Drake or someone else familiar with Orego knows more about this?

Anyway, reading the same paragraph mentioned above again I was struckby another detail I thought surprising: after doing the requirednumber of runs, the candidates are pruned to a certain number 'n'based on patterns. Does that mean from then on the win-ratio isignored? What if the by far most successful move so far does notmatch any pattern? Am I misunderstanding something here? Theparagraph is very brief and does not elaborate much detail.

On to my next step I introduced some very basic tactics to savestones with one liberty, capture the opponent's stones with oneliberty and capturing the opponent's stones in a ladder. There aremany possible choices here. Just doing this near the last move and/orover the whole board. Doing this in the simulation and/or during theselection.

Just doing this near the last move during simulation caused a slow-down of a factor 4 or 5 but improves play considerably. Also doingthis near the last move during selection doesn't affect speed muchbut deteriorated play! Doing this first near the last move and thenlook for tactics over the whole board as a next step affected resultsnegatively even more. Number of playouts are still in the same ball-park.

Thinking it over, since I don't use this to prune the selection butjust to order the candidates I could see that after many runs theordering suggested by the tactics get overriden by the UCT selection.So I could see the effect of using this for selection reducedsteadily with the number of runs through a node. But still I didn'texpect a considerable reduction in strength. So what could behappening here?


- I could have a bug.
- I didn't run enough games (about 50)

- Using knowledge to order the initial selection is counter-productive when not accompanied with pruning.

The last one I find very hard to believe. Did anyone else run intosomething like this?

Finally, I also looked a bit at using more threads to make use ofmore than one processor. I figure this can wait and it's better tokeep things simple at this early stage but still it's something Iwant to keep in mind. When looking at what I need to do to enablemultiple threads during search it seems to me I'll be required tolock substantial parts of the UCT-tree. This means traversing thetree when looking for the best node to expand is going to be the mainbottle-neck. Maybe not with just two to four processors, but Iforesee substantial diminishing returns after that. Is this correct?Is there experience with many processors? Maybe a different expansionalgorithm will be required?


        Mark

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] More UCT / Monte-Carlo questions

Reply via email to