Re: [computer-go] Tweak to MCTS selection criterion

Christian Nentwich Sun, 07 Jun 2009 03:41:30 -0700

This is kind of interesting. Is anybody measuring their playoutperformance in real-time at the moment and performing this sort ofcomputation, to check if overtaking the leading move is mathematicallyimpossible?

It's interesting with UCT because of the interplay between the timemanagement algorithm and the exploration parameter. Suppose you areearly in the game, and your time management algorithm says you should bespending 10 seconds on a move. After six seconds, because your parameteris skewed towards exploitation, you already have 90% more trials on theleading move than anything else, calculate that it cannot be overtakenand abort.


Some things come to mind:

- If this behaviour is happening consistently, i.e. you end up spendingtoo little time on all your moves, is your exploitation parametercorrect? There is a reason you use a time management algorithm toallocate a lot of time in the beginning. You may be doing pointlesssearches.- Would you rather exploit less in that case, thus spending yourallotted time to do more exploration, or would you instead keepsearching instead of aborting and reuse the tree for pondering and/oryour follow-up move?

Given that people spend a lot of time experimenting on goodexploitation/exploration parameters, I suspect that the last option(obey the time management, continue searching, reuse the tree) is thebetter?


Christian


Don Dailey wrote:

2009/6/6 <dhillism...@netscape.net <mailto:dhillism...@netscape.net>>

    I think this is one of those design decisions that nobody takes on
    faith. We all wind up testing it both ways and in various
    combinations.

    An additional advantage of using the number of visits is that
    branches at the root become mathematically eliminated and can be
    pruned away. It often also allows the search to be stopped early.
    It can save a lot of time for forced moves.
Let me see if I understand what you are saying here.If you have set a goal time for thinking of 10 seconds, and let's say6 seconds have progressed, then it might be possible to stop thesearch early if you do the math and see that it's not possible for anyother move to have more visits given an additional 4 seconds?So when there is a position that has only a single clearly best move,perhaps a capture that cannot wait or a necessary atari defense, thenyou can probably save as much (or close to) as half the thinking timeon that move. Is this correct?So if this understanding is correct, then it still makes sense to dothe "pebbles" test at this point and check to see if another move hasa higher score before stopping the search. If the move really isforced and necessary, then the answer will be no and you can stopearly. If there is a move that currently appears better but with a"too small" sample, then it seems foolish to stop early. If themove is a result of just a few lucky playouts, then it will quickly berevealed and you can still stop early.
- Don
    - Dave Hillis



    -----Original Message-----
    From: Michael Williams <michaelwilliam...@gmail.com
    <mailto:michaelwilliam...@gmail.com>>
    To: computer-go <computer-go@computer-go.org
    <mailto:computer-go@computer-go.org>>
    Sent: Sat, 6 Jun 2009 5:07 pm
    Subject: Re: [computer-go] Tweak to MCTS selection criterion

    Another strategy to be considered is to not allow the thinking to
    cease until the maximum win rate and the maximum visit count agree
    on the same move. Obviously this requires some extra code to make
sure you don't lose on time, etc.Brian Sheppard wrote:> When a UCT search is completed, the usual selection criterion is> "choose the move that has the most trials." This is more stable> than choosing the move that has the highest percentage of wins,> since it is possible to have an unreliably high percentage if the> number of trials is small.> > I have a small tweak to that criterion. Pebbles uses "choose the> move that has the most wins." This rule selects the same move as> the conventional criterion in almost every case. The reason why> Pebbles' rule is superior is revealed in the case where the moves> differ.> > When Pebbles chooses a different move than the conventionalcriterion,> it is because Pebbles move has more wins in fewer trials. When that> happens, Pebbles move would inevitably become the move with themost> trials if searching were to continue. So there is actually nodownside.> Of course, the upside is minor, too.> > For validation, Pebbles has been using both strategies on CGOSgames.> At present, the conventional selection strategy has won 341/498= 68.47%.> Pebbles strategy has won 415/583 = 71.18%. This isn't statistically> conclusive or anything (0.7 standard deviations; we would need 4to 8> times as many trials for strong statistical evidence). But Pebbles'> strategy should be better by a small amount, and it has been, so I> present it to you with confidence.> > Best,> Brian> > _______________________________________________> computer-go mailing list> computer-go@computer-go.org <mailto:computer-go@computer-go.org>> http://www.computer-go.org/mailman/listinfo/computer-go/>_______________________________________________computer-go mailing listcomputer-go@computer-go.org <mailto:computer-go@computer-go.org>http://www.computer-go.org/mailman/listinfo/computer-go/
    ------------------------------------------------------------------------
    Wanna slim down for summer? Go to America Takes it Off
    
<http://www.aolhealth.com/diet/weight-loss-program/?ncid=emlcntusheal00000001>
    to learn how.

    _______________________________________________
    computer-go mailing list
    computer-go@computer-go.org <mailto:computer-go@computer-go.org>
    http://www.computer-go.org/mailman/listinfo/computer-go/


------------------------------------------------------------------------

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Tweak to MCTS selection criterion

Reply via email to