[computer-go] Citation on zero exploration?
Many of us have concluded that, with RAVE, there is no need for a UCT exploration term: http://computer-go.org/pipermail/computer-go/2009-June/018773.html Is there a published source on this result that I could cite? Thanks, Peter Drake http://www.lclark.edu/~drake/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] [computer-go] Citation on zero exploration?
http://computer-go.org/pipermail/computer-go/2009-June/018773.html As I've often said something related to that (e.g. in http://hal.inria.fr/inria-00369786/fr/ ) I'd like to be more precise. What follows is for binary deterministic games, and I precise at the beginning that this is not intended to make programs much stronger. UCT (with exploration) is consistent, in the sense that asymptotically it will find an optimal move, if any winning move exists. With no exploration, this is no more true, even with rave: there are cases in which rave will only focus on one move and will simulate only this one, in spite of a poor winning rate. However, a simple trick recovers consistency: use (nbWins+K)/(nbSimulations+2K) for example, or other regularization tricks, if the weight of Rave decreases to 0 when the number of simulations goes to infinity. However, if you have patterns with negative weights and the score is biased by these patterns (possibly until a negative value), then you loose consistency whenever the weight of the patterns decrease to 0 when the number of simulations of the corresponding move goes to 0. Then, we might feel that we need more than consistency: we want consistency, and we do not want to simulate all the tree infinitely often. This can be done with some specific rules, and in mogo we have added a simple patch which, roughly, consists in simulating randomly the children nodes when the success rate of a node is very low (below 30%). This ensures that the following rules are enough for ensuring consistency: score(move) - winRate(move) -- 0 as nbSims(move) -- infinity as we optimize automatically patterns and parameters, the advantage is that we can do it without any risk of loosing consistency. More precisely, http://www.lri.fr/~teytaud/consistency.pdf (submitted; not a lot of numerical experiments, it's for theory mainly - yet, for 500 000 simulations per move, there is a significant difference, and for a specific position, it works well - there was absolutely no tuning for this). I don't claim this will make a big change in Monte-Carlo Tree Search - just a bit of theoretical understanding, and the pleasure of asymptotic consistency - maybe it will make a real difference for very long thinking time for which we can't use tuning due to the computational cost :-) and it's the first time I have more than 50% for a modification derived by theoretical ideas :-) Best regards, Olivier ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Citation on zero exploration?
By result, do you mean this observation or a quest for an explanation? If you merely wish to say that many/most current UCT programs have no need for an exploration term, then that is a context-specific (e.g. not for the E-E in Go paper) heuristic or experimental statement, not a formal one. A source for such a statement has to be more than a paper that simply notices a similar effect for their own application. One would have to reference a larger body of experimentalists or a general consensus. Just my humble opinion, Cenny Wenner On 11/9/09, Peter Drake dr...@lclark.edu wrote: Many of us have concluded that, with RAVE, there is no need for a UCT exploration term: http://computer-go.org/pipermail/computer-go/2009-June/018773.html Is there a published source on this result that I could cite? Thanks, Peter Drake http://www.lclark.edu/~drake/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [computer-go] Citation on zero exploration?
I'm actually looking for something weaker than what Olivier has offered: a published report of the empirical finding that (for some programs, at least) an exploration coefficient of zero works best. Peter Drake http://www.lclark.edu/~drake/ On Nov 9, 2009, at 10:15 AM, Olivier Teytaud wrote: Hi; I'd like to answer your post but I must admit I've not clearly understood. My PDF file is essentially a mathematical analysis, proving that we can have consistency with some rules, without having infinitely many visits of the whole tree. UCT has the first property (consistency), but not the second (UCT visits all the tree infinitely often). This is proved under clearly stated assumptions on the problem; including deterministic two-player zero-sum games, and therefore including Go. Best regards, Olivier By result, do you mean this observation or a quest for an explanation? If you merely wish to say that many/most current UCT programs have no need for an exploration term, then that is a context-specific (e.g. not for the E-E in Go paper) heuristic or experimental statement, not a formal one. A source for such a statement has to be more than a paper that simply notices a similar effect for their own application. One would have to reference a larger body of experimentalists or a general consensus. Just my humble opinion, Cenny Wenner On 11/9/09, Peter Drake dr...@lclark.edu wrote: Many of us have concluded that, with RAVE, there is no need for a UCT exploration term: http://computer-go.org/pipermail/computer-go/2009-June/018773.html Is there a published source on this result that I could cite? Thanks, Peter Drake http://www.lclark.edu/~drake/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- = Olivier Teytaud (TAO-inria) olivier.teyt...@inria.fr Tel (33)169154231 / Fax (33)169156586 Equipe TAO (Inria-Futurs), LRI, UMR 8623(CNRS - Universite Paris-Sud), bat 490 Universite Paris-Sud 91405 Orsay Cedex France (one of the 56.5 % of french who did not vote for Sarkozy in 2007) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [SPAM] Re: [computer-go] Citation on zero exploration?
I don't know if a post in the computer-go mailing list is a report, but you can find numbers in this post: http://computer-go.org/pipermail/computer-go/2008-May/014854.html From the numbers I would say that it shows that all sufficiently small constants are equivalent - maybe more experiments would be interesting. Olivier I'm actually looking for something weaker than what Olivier has offered: a published report of the empirical finding that (for some programs, at least) an exploration coefficient of zero works best. Peter Drake http://www.lclark.edu/~drake/ http://www.lclark.edu/%7Edrake/ On Nov 9, 2009, at 10:15 AM, Olivier Teytaud wrote: Hi; I'd like to answer your post but I must admit I've not clearly understood. My PDF file is essentially a mathematical analysis, proving that we can have consistency with some rules, without having infinitely many visits of the whole tree. UCT has the first property (consistency), but not the second (UCT visits all the tree infinitely often). This is proved under clearly stated assumptions on the problem; including deterministic two-player zero-sum games, and therefore including Go. Best regards, Olivier By result, do you mean this observation or a quest for an explanation? If you merely wish to say that many/most current UCT programs have no need for an exploration term, then that is a context-specific (e.g. not for the E-E in Go paper) heuristic or experimental statement, not a formal one. A source for such a statement has to be more than a paper that simply notices a similar effect for their own application. One would have to reference a larger body of experimentalists or a general consensus. Just my humble opinion, Cenny Wenner On 11/9/09, Peter Drake dr...@lclark.edu wrote: Many of us have concluded that, with RAVE, there is no need for a UCT exploration term: http://computer-go.org/pipermail/computer-go/2009-June/018773.html Is there a published source on this result that I could cite? Thanks, Peter Drake http://www.lclark.edu/~drake/ http://www.lclark.edu/%7Edrake/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- = Olivier Teytaud (TAO-inria) olivier.teyt...@inria.fr Tel (33)169154231 / Fax (33)169156586 Equipe TAO (Inria-Futurs), LRI, UMR 8623(CNRS - Universite Paris-Sud), bat 490 Universite Paris-Sud 91405 Orsay Cedex France (one of the 56.5 % of french who did not vote for Sarkozy in 2007) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- = Olivier Teytaud (TAO-inria) olivier.teyt...@inria.fr Tel (33)169154231 / Fax (33)169156586 Equipe TAO (Inria-Futurs), LRI, UMR 8623(CNRS - Universite Paris-Sud), bat 490 Universite Paris-Sud 91405 Orsay Cedex France (one of the 56.5 % of french who did not vote for Sarkozy in 2007) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [computer-go] Citation on zero exploration?
On Mon, Nov 09, 2009 at 10:18:25AM -0800, Peter Drake wrote: I'm actually looking for something weaker than what Olivier has offered: a published report of the empirical finding that (for some programs, at least) an exploration coefficient of zero works best. I think you could use Combining expert, offline, transient and online knowledge in Monte-Carlo exploration for that, since there is presented an AMAF equation without any exploration term, and the final equation has no exploration term either. -- Petr Pasky Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [computer-go] Citation on zero exploration?
My mistake. The comment was directed to the original post and not yours. I was being too slow writing a reply. (yikes, i really dislike the formatting of the http://hal.inria.fr/inria-00369786/fr article) On 11/9/09, Olivier Teytaud olivier.teyt...@lri.fr wrote: Hi; I'd like to answer your post but I must admit I've not clearly understood. My PDF file is essentially a mathematical analysis, proving that we can have consistency with some rules, without having infinitely many visits of the whole tree. UCT has the first property (consistency), but not the second (UCT visits all the tree infinitely often). This is proved under clearly stated assumptions on the problem; including deterministic two-player zero-sum games, and therefore including Go. Best regards, Olivier By result, do you mean this observation or a quest for an explanation? If you merely wish to say that many/most current UCT programs have no need for an exploration term, then that is a context-specific (e.g. not for the E-E in Go paper) heuristic or experimental statement, not a formal one. A source for such a statement has to be more than a paper that simply notices a similar effect for their own application. One would have to reference a larger body of experimentalists or a general consensus. Just my humble opinion, Cenny Wenner On 11/9/09, Peter Drake dr...@lclark.edu wrote: Many of us have concluded that, with RAVE, there is no need for a UCT exploration term: http://computer-go.org/pipermail/computer-go/2009-June/018773.html Is there a published source on this result that I could cite? Thanks, Peter Drake http://www.lclark.edu/~drake/ http://www.lclark.edu/%7Edrake/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- = Olivier Teytaud (TAO-inria) olivier.teyt...@inria.fr Tel (33)169154231 / Fax (33)169156586 Equipe TAO (Inria-Futurs), LRI, UMR 8623(CNRS - Universite Paris-Sud), bat 490 Universite Paris-Sud 91405 Orsay Cedex France (one of the 56.5 % of french who did not vote for Sarkozy in 2007) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [computer-go] Citation on zero exploration?
On Mon, Nov 09, 2009 at 07:42:46PM +0100, Cenny Wenner wrote: My mistake. The comment was directed to the original post and not yours. I was being too slow writing a reply. (yikes, i really dislike the formatting of the http://hal.inria.fr/inria-00369786/fr article) Which, incidentally, also is a publication mentioning c=0 favourableness. :-) -- Petr Pasky Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [computer-go] Citation on zero exploration?
Perfect! The very similar paper (by most of the same authors) Adding expert knowledge and exploration in Monte-Carlo Tree Search contains the key passage: In MoGo, the constant in front of the exploration term was not null before the introduction of RAVE values in [10]; it is now 0. Peter Drake http://www.lclark.edu/~drake/ On Nov 9, 2009, at 10:31 AM, Petr Baudis wrote: On Mon, Nov 09, 2009 at 10:18:25AM -0800, Peter Drake wrote: I'm actually looking for something weaker than what Olivier has offered: a published report of the empirical finding that (for some programs, at least) an exploration coefficient of zero works best. I think you could use Combining expert, offline, transient and online knowledge in Monte-Carlo exploration for that, since there is presented an AMAF equation without any exploration term, and the final equation has no exploration term either. -- Petr Pasky Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/