Many Faces of Go doesn’t use Remi’s playout policy and I don’t think Zen does 
either.  I don’t think Remi’s and Mogo’s are similar either, since they were in 
some ways competing developments.  The bias issue is very real, so as you add 
knowledge to the playouts you have to be careful to add (for example) both 
attack and defense moves in a situation.

 

David

 

From: Computer-go [mailto:[email protected]] On Behalf Of 
Tobias Pfeiffer
Sent: Tuesday, November 03, 2015 12:39 PM
To: [email protected]; [email protected]
Subject: Re: [Computer-go] AMAF/RAVE + heavy playouts - is it save?

 

This helps very much, thank you for taking the time to answer!

You might be looking for for "Combining Online and Offline Knowledge in UCT" 
[1] by Gelly and Silver. Silver Tesauroreference it in "Monte-carlo Simulation 
Balancing" [2] with "Unfortunately, a stronger simulation policy can actually 
lead to a weaker Monte-Carlo search (Gelly & Silver, 2007), a paradox that we 
explore further in this paper."

I'll make it a priority to read both papers in detail thank you! If you meant 
another paper, someone else knows one I'm happy to see more references.

Thanks!
Tobi


[1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf
[2] http://www.machinelearning.org/archive/icml2009/papers/500.pdf



On 03.11.2015 21:03, [email protected] wrote:

You have to be careful what heuristics you apply. This was a surprising result: 
using a playout policy which in itself is a stronger go player can actually 
make MCTS/AMAF weaker. The reason is that MCTS depends entirely on accurate 
estimations of the value of each position in the tree. Any playout policy which 
introduces a bias therefore weakens MCTS. It may increase precision (lower 
standard deviation) but gives a less accurate assessment of the value (an 
incorrect mean). Most playouts at the moment (at least published ones) are 
based on Remi's Mogo playout policy, which increases precision without 
sacrificing accuracy.

There's a really nice diagram in one of David Silver's papers illustrating the 
effect that bias can have on playouts. As soon as you see it you understand the 
problem. Unfortunately I don't have it to hand and have unfortunately run out 
of time looking for it, otherwise I'd reference it. Hopefully somebody else can 
give the reference. I suspect David probably co-authored the paper in which 
case apologies to the other author for not crediting them here!

I hope this helps

Regards

Raffles

On 03-Nov-15 19:38, Tobias Pfeiffer wrote:

Hi everyone,
 
I haven't yet caught up on most recent go papers. If what I ask is
answered in one of these, please point there.
 
It seems everyone is using quite heavy playouts these days (nxn
patterns, atari escapes, opening libraris, lots of stuff that I don't
know yet, ...) - my question is how does that mix with AMAF/RAVE? I
remember from the early papers, that they said it'd be dangerous to do
it with non random playouts and that they shouldn't have too much logic.
 
Which, well, makes sense (to me) because the argument is that we play
random moves so they are order independent. With patterns that doesn't
hold true anymore.
 
What's the experience out there? Does it just still work? Does it not
matter because you just "warm up" the tree? Or do you need to be careful
with what heuristics you apply not too break RAVE/AMAF?
 
Thank you!
Tobi
 






_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go






-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7163 / Virus Database: 4457/10906 - Release Date: 10/28/15







_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go





-- 
www.pragtob.info
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to