This helps very much, thank you for taking the time to answer! You might be looking for for "Combining Online and Offline Knowledge in UCT" [1] by Gelly and Silver. Silver Tesauroreference it in "Monte-carlo Simulation Balancing" [2] with "Unfortunately, a stronger simulation policy can actually lead to a weaker Monte-Carlo search (Gelly & Silver, 2007), a paradox that we explore further in this paper."
I'll make it a priority to read both papers in detail thank you! If you meant another paper, someone else knows one I'm happy to see more references. Thanks! Tobi [1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf [2] http://www.machinelearning.org/archive/icml2009/papers/500.pdf On 03.11.2015 21:03, [email protected] wrote: > You have to be careful what heuristics you apply. This was a > surprising result: using a playout policy which in itself is a > stronger go player can actually make MCTS/AMAF weaker. The reason is > that MCTS depends entirely on accurate estimations of the value of > each position in the tree. Any playout policy which introduces a bias > therefore weakens MCTS. It may increase precision (lower standard > deviation) but gives a less accurate assessment of the value (an > incorrect mean). Most playouts at the moment (at least published ones) > are based on Remi's Mogo playout policy, which increases precision > without sacrificing accuracy. > > There's a really nice diagram in one of David Silver's papers > illustrating the effect that bias can have on playouts. As soon as you > see it you understand the problem. Unfortunately I don't have it to > hand and have unfortunately run out of time looking for it, otherwise > I'd reference it. Hopefully somebody else can give the reference. I > suspect David probably co-authored the paper in which case apologies > to the other author for not crediting them here! > > I hope this helps > > Regards > > Raffles > > On 03-Nov-15 19:38, Tobias Pfeiffer wrote: >> Hi everyone, >> >> I haven't yet caught up on most recent go papers. If what I ask is >> answered in one of these, please point there. >> >> It seems everyone is using quite heavy playouts these days (nxn >> patterns, atari escapes, opening libraris, lots of stuff that I don't >> know yet, ...) - my question is how does that mix with AMAF/RAVE? I >> remember from the early papers, that they said it'd be dangerous to do >> it with non random playouts and that they shouldn't have too much logic. >> >> Which, well, makes sense (to me) because the argument is that we play >> random moves so they are order independent. With patterns that doesn't >> hold true anymore. >> >> What's the experience out there? Does it just still work? Does it not >> matter because you just "warm up" the tree? Or do you need to be careful >> with what heuristics you apply not too break RAVE/AMAF? >> >> Thank you! >> Tobi >> >> >> >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://computer-go.org/mailman/listinfo/computer-go >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 2016.0.7163 / Virus Database: 4457/10906 - Release Date: 10/28/15 > > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://computer-go.org/mailman/listinfo/computer-go -- www.pragtob.info
_______________________________________________ Computer-go mailing list [email protected] http://computer-go.org/mailman/listinfo/computer-go
