[Computer-go] Multi-armed bandit problem theory
Hi! Does anyone have a good source for understanding the theory behind the multi-armed bandit problem, i.e. the proof behind the exponential arm play bounds etc.? My only source so far is Auer et al., 2002: Finite-time Analysis of the Multiarmed Bandit Problem - but I suspect its description of the original bound is incomplete and/or simplified with some implicit assumptions (i.e. in case of optimal arm, the bound would involve division by zero?). Everyone refers to Lai Robbins, 1985 and Agrawal, 1995, but I'm unable to find these papers anywhere (my university JTOR subscription somehow magically doesn't seem to cover Agrawal, 1995). I'm hoping that maybe I could grasp the details if I read those, does anyone have a copy? Thanks, -- Petr Pasky Baudis We live on an island surrounded by a sea of ignorance. As our island of knowledge grows, so does the shore of our ignorace. -- J. A. Wheeler ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Re: [Computer-go] Multi-armed bandit problem theory
Not a direct answer, but some bit of information: Bandit theory started in the early 1950' by Herbert Robbins (the same Robbins from the 1985 paper). However, he did not prove best possible bounds in the seminal paper. Ingo. Original-Nachricht Datum: Wed, 26 Oct 2011 11:23:54 +0200 Von: Petr Baudis pa...@ucw.cz An: computer...@computer-go.org Betreff: [Computer-go] Multi-armed bandit problem theory Hi! Does anyone have a good source for understanding the theory behind the multi-armed bandit problem, i.e. the proof behind the exponential arm play bounds etc.? My only source so far is Auer et al., 2002: Finite-time Analysis of the Multiarmed Bandit Problem - but I suspect its description of the original bound is incomplete and/or simplified with some implicit assumptions (i.e. in case of optimal arm, the bound would involve division by zero?). Everyone refers to Lai Robbins, 1985 and Agrawal, 1995, but I'm unable to find these papers anywhere (my university JTOR subscription somehow magically doesn't seem to cover Agrawal, 1995). I'm hoping that maybe I could grasp the details if I read those, does anyone have a copy? Thanks, -- Petr Pasky Baudis We live on an island surrounded by a sea of ignorance. As our island of knowledge grows, so does the shore of our ignorace. -- J. A. Wheeler ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
[Computer-go] MCTS playouts per second
Just want to check what the expected playout performance is of well tuned monte-carlo engines? My MCTS engine is averaging apx 3,500 lightweight playouts per second on a single i5 32 bit cpu. Any suggestions on very efficient source code examples for fast monte-carlo playouts? I've spent a lot of time comparing recursive group formation vs non-recursive but it doesn't seem to make a big difference. It seems that updating the list of likely moves after every play with something similar to the mogo probability rules is the most time consuming part as I currently recalculate the probabilities of moves at every empty point on the board each turn. It seems necessary if one doesn't want to handle all the exceptions to keeping the previous turn's play probabilities. Also any thoughts on combining pattern scoring and other conventional techniques together with a UCT tree? If two branches have very similar simulated win ratios could one use other factors to choose the best branch? It seems if there is a very wide branching such as at the beginning of the game, there is a lot of room to add other heuristics to choosing the best move when monte-carlo scores are within range of expected error. ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Re: [Computer-go] 2012 International Go Conference at the US Congress
Hello All, ** *CALL FOR PAPERS/PARTICIPANTS* *at* *THE 2012 INTERNATIONAL GO SYMPOSIUM* ** An International Go Symposium will take place August 3rd and 4th, 2012 at the beginning of the U.S. Go Congress in Black Mountain, North Carolina. Presentations can include educational, cultural, historical, literary, artistic and scientific aspects of the game. Depending on the number and nature of the talks, suggested timing is a half-hour presentation with a 15 minute question and answer period. Additional opportunities for questions and answers afterwards will be available. Translators can be provided. We are exploring the idea of expanding the usual opportunities to present papers at go conferences to include talks via Skype with a large screen and audience participation along with oral or texting possibilities for extended questions and answers via the Internet. And to alleviate problems with the differences in times, we are also thinking about including talks on DVDs or with other pre-recorded means. Much of this will depend on the number and quality of papers and the amount of sponsorship, so please indicate whether you could attend with or without partial travel expenses or if you would be interested in giving a talk via Skype or pre-recordings. For those who wish to publish, papers can be included in an e-publication connected with the American Go Association web site and e-Journal ( www.usgo.org). Another change from the usual proceedings of a conference is that (again for those who wish to) papers will be put up before the Symposium which will enable better audience participation and the ability to write more than is possible to include in the talk. Please contact: Peter Shotwell pshotw...@gmail.com For reference, the beta website for the Symposium is at http://www.gosymposium.org/ and the records of a similar ICOB Conference in 2003 is at http://www.usgo.org/yearbook/2003ICOB.html, while a 2008 Symposium in Sweden is at http://egc2008.eu/en/events/symposium.php. We are currently reviewing our opportunities and options for sponsorship and would appreciate any suggestions. *If you wrote me last spring, please do so again to let me know that you are still interestedand that there were no miscommunications.* ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Re: [Computer-go] Multi-armed bandit problem theory
I tried to find those papers once, too, and I also failed. It seems to predate Internet publishing. Some key results were proved in the 1970's. I think the following ideas sketch a proof: - sample each arm infinitely often - such that the fraction of effort going to sub-optimal arms decreases to zero Now look at the form of Hoeffding's inequality: http://en.wikipedia.org/wiki/Hoeffding's_inequality. The UCB term is like the functional inverse of the right-hand side. That is why it works for bandit problems. To go from bandit theory to UCT, you have to show that recursive application of the same process converges, which is the proof that you see in the Auer paper. Note that there are solutions to the bandit problem that would not be solutions for MCTS, because they do not converge recursively. E.g., you can make an epsilon-greedy policy that works for a bandit problem (works == zero asymptotic regret), but would not converge to optimal in an MCTS setting. Hope this helps. Brian -Original Message- From: computer-go-boun...@dvandva.org [mailto:computer-go-boun...@dvandva.org] On Behalf Of Petr Baudis Sent: Wednesday, October 26, 2011 8:51 AM To: computer-go@dvandva.org Subject: Re: [Computer-go] Multi-armed bandit problem theory On Wed, Oct 26, 2011 at 01:56:03PM +0200, Ingo Althöfer wrote: Not a direct answer, but some bit of information: Bandit theory started in the early 1950' by Herbert Robbins (the same Robbins from the 1985 paper). However, he did not prove best possible bounds in the seminal paper. Yes, I actually have a copy of that paper but it doesn't seem that it could help me better understand the later results. Petr Pasky Baudis ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Re: [Computer-go] MCTS playouts per second
Zen has very slow (but very smart) heavy playout, about 1 kpo/s on a 3 GHz Intel Core2 core. One important tenuki for speed-up is to check the legality of moves _after_ selecting a move, i.e., don't check the legality of all moves before choosing a move. Hideki Scott Christensen: CAHuTwaxAW=44AXc6hhrtbYMxfJ079u=jxVjp=yrwcrb6j-v...@mail.gmail.com: Just want to check what the expected playout performance is of well tuned monte-carlo engines? My MCTS engine is averaging apx 3,500 lightweight playouts per second on a single i5 32 bit cpu. Any suggestions on very efficient source code examples for fast monte-carlo playouts? I've spent a lot of time comparing recursive group formation vs non-recursive but it doesn't seem to make a big difference. It seems that updating the list of likely moves after every play with something similar to the mogo probability rules is the most time consuming part as I currently recalculate the probabilities of moves at every empty point on the board each turn. It seems necessary if one doesn't want to handle all the exceptions to keeping the previous turn's play probabilities. Also any thoughts on combining pattern scoring and other conventional techniques together with a UCT tree? If two branches have very similar simulated win ratios could one use other factors to choose the best branch? It seems if there is a very wide branching such as at the beginning of the game, there is a lot of room to add other heuristics to choosing the best move when monte-carlo scores are within range of expected error. ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go -- Hideki Kato mailto:hideki_ka...@ybb.ne.jp ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Re: [Computer-go] MCTS playouts per second
On 19x19, Erica's speed is around 5,500 lightweight playouts per second on a single i7 cpu. As far as I know, Lukasz Lew's libego, which is open source, is the fastest implementation of MCTS and can reach around 6,000-7,000 lightweight playouts per second in the same cpu. Aja -原始郵件- From: Scott Christensen Sent: Wednesday, October 26, 2011 6:48 AM To: computer-go@dvandva.org Subject: [Computer-go] MCTS playouts per second Just want to check what the expected playout performance is of well tuned monte-carlo engines? My MCTS engine is averaging apx 3,500 lightweight playouts per second on a single i5 32 bit cpu. Any suggestions on very efficient source code examples for fast monte-carlo playouts? I've spent a lot of time comparing recursive group formation vs non-recursive but it doesn't seem to make a big difference. It seems that updating the list of likely moves after every play with something similar to the mogo probability rules is the most time consuming part as I currently recalculate the probabilities of moves at every empty point on the board each turn. It seems necessary if one doesn't want to handle all the exceptions to keeping the previous turn's play probabilities. Also any thoughts on combining pattern scoring and other conventional techniques together with a UCT tree? If two branches have very similar simulated win ratios could one use other factors to choose the best branch? It seems if there is a very wide branching such as at the beginning of the game, there is a lot of room to add other heuristics to choosing the best move when monte-carlo scores are within range of expected error. ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Re: [Computer-go] MCTS playouts per second
Clarification: the numbers that I quoted on my previous post were on 9x9. -Original Message- From: computer-go-boun...@dvandva.org [mailto:computer-go-boun...@dvandva.org] On Behalf Of Aja Huang Sent: Wednesday, October 26, 2011 10:23 PM To: computer-go@dvandva.org Subject: Re: [Computer-go] MCTS playouts per second On 19x19, Erica's speed is around 5,500 lightweight playouts per second on a single i7 cpu. As far as I know, Lukasz Lew's libego, which is open source, is the fastest implementation of MCTS and can reach around 6,000-7,000 lightweight playouts per second in the same cpu. Aja -- From: Scott Christensen Sent: Wednesday, October 26, 2011 6:48 AM To: computer-go@dvandva.org Subject: [Computer-go] MCTS playouts per second Just want to check what the expected playout performance is of well tuned monte-carlo engines? My MCTS engine is averaging apx 3,500 lightweight playouts per second on a single i5 32 bit cpu. Any suggestions on very efficient source code examples for fast monte-carlo playouts? I've spent a lot of time comparing recursive group formation vs non-recursive but it doesn't seem to make a big difference. It seems that updating the list of likely moves after every play with something similar to the mogo probability rules is the most time consuming part as I currently recalculate the probabilities of moves at every empty point on the board each turn. It seems necessary if one doesn't want to handle all the exceptions to keeping the previous turn's play probabilities. Also any thoughts on combining pattern scoring and other conventional techniques together with a UCT tree? If two branches have very similar simulated win ratios could one use other factors to choose the best branch? It seems if there is a very wide branching such as at the beginning of the game, there is a lot of room to add other heuristics to choosing the best move when monte-carlo scores are within range of expected error. ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
Re: [Computer-go] Computer-go Digest, Vol 21, Issue 16
Pachi C++ source code (http://pachi.or.cz/) does appears quite straightforward, Fuego libego are a bit more challenging. I've started by using ideas from Wally (available through: http://www.usgo.org/resources/computer.html) which is probably the most compact and easy to read code I've seen but is not very efficient. Thanks Hideki, I am indeed doing full liberty checks on every move in the list, I will change to only doing the checks when selected. Exact probability of moves after the first 6-7 moves in the tree doesn't seem that important really. I'm only getting the 3.5k playouts/sec on 9x9, It would seem with apx 5k playouts on 19x19, for 9x9 we should around 50-100k playouts. ___ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go