[Computer-go] Multi-armed bandit problem theory

2011-10-26 Thread Petr Baudis
  Hi!

  Does anyone have a good source for understanding the theory behind
the multi-armed bandit problem, i.e. the proof behind the exponential
arm play bounds etc.? My only source so far is Auer et al., 2002:
Finite-time Analysis of the Multiarmed Bandit Problem - but I suspect
its description of the original bound is incomplete and/or simplified
with some implicit assumptions (i.e. in case of optimal arm, the bound
would involve division by zero?).

  Everyone refers to Lai  Robbins, 1985 and Agrawal, 1995, but I'm
unable to find these papers anywhere (my university JTOR subscription
somehow magically doesn't seem to cover Agrawal, 1995). I'm hoping
that maybe I could grasp the details if I read those, does anyone have
a copy?

  Thanks,

-- 
Petr Pasky Baudis
We live on an island surrounded by a sea of ignorance. As our island
of knowledge grows, so does the shore of our ignorace. -- J. A. Wheeler
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] Multi-armed bandit problem theory

2011-10-26 Thread Ingo Althöfer
Not a direct answer, but some bit of information:
Bandit theory started in the early 1950' by Herbert Robbins
(the same Robbins from the 1985 paper). However, he did
not prove best possible bounds in the seminal paper.

Ingo.




 Original-Nachricht 
 Datum: Wed, 26 Oct 2011 11:23:54 +0200
 Von: Petr Baudis pa...@ucw.cz
 An: computer...@computer-go.org
 Betreff: [Computer-go] Multi-armed bandit problem theory

   Hi!
 
   Does anyone have a good source for understanding the theory behind
 the multi-armed bandit problem, i.e. the proof behind the exponential
 arm play bounds etc.? My only source so far is Auer et al., 2002:
 Finite-time Analysis of the Multiarmed Bandit Problem - but I suspect
 its description of the original bound is incomplete and/or simplified
 with some implicit assumptions (i.e. in case of optimal arm, the bound
 would involve division by zero?).
 
   Everyone refers to Lai  Robbins, 1985 and Agrawal, 1995, but I'm
 unable to find these papers anywhere (my university JTOR subscription
 somehow magically doesn't seem to cover Agrawal, 1995). I'm hoping
 that maybe I could grasp the details if I read those, does anyone have
 a copy?
 
   Thanks,
 
 -- 
   Petr Pasky Baudis
 We live on an island surrounded by a sea of ignorance. As our island
 of knowledge grows, so does the shore of our ignorace. -- J. A. Wheeler
 ___
 Computer-go mailing list
 Computer-go@dvandva.org
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


[Computer-go] MCTS playouts per second

2011-10-26 Thread Scott Christensen
Just want to check what the expected playout performance is of well
tuned monte-carlo engines?  My MCTS engine is averaging apx 3,500
lightweight playouts per second on a single i5 32 bit cpu.  Any
suggestions on very efficient source code examples for fast
monte-carlo playouts?

I've spent a lot of time comparing recursive group formation vs
non-recursive but it doesn't seem to make a big difference.  It seems
that updating the list of likely moves after every play with something
similar to the mogo probability rules is the most time consuming part
as I currently recalculate the probabilities of moves at every empty
point on the board each turn. It seems necessary if one doesn't want
to handle all the exceptions to keeping the previous turn's play
probabilities.

Also any thoughts on combining pattern scoring and other conventional
techniques together with a UCT tree?   If two branches have very
similar simulated win ratios could one use other factors to choose the
best branch?  It seems if there is a very wide branching such as at
the beginning of the game, there is a lot of room to add other
heuristics to choosing the best move when monte-carlo scores are
within range of expected error.
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] 2012 International Go Conference at the US Congress

2011-10-26 Thread P Shotwell
Hello All,

**

*CALL FOR PAPERS/PARTICIPANTS*

*at*

*THE 2012 INTERNATIONAL GO SYMPOSIUM*

**

An International Go Symposium will take place August 3rd and 4th, 2012 at
the beginning of the U.S. Go Congress in Black Mountain, North Carolina.
Presentations can include educational, cultural, historical, literary,
artistic and scientific aspects of the game. Depending on the number and
nature of the talks, suggested timing is a half-hour presentation with a 15
minute question and answer period.  Additional opportunities for questions
and answers afterwards will be available. Translators can be provided.

We are exploring the idea of expanding the usual opportunities to present
papers at go conferences to include talks via Skype with a large screen and
audience participation along with oral or texting possibilities for extended
questions and answers via the Internet. And to alleviate problems with the
differences in times, we are also thinking about including talks on DVDs or
with other pre-recorded means. Much of this will depend on the number and
quality of papers and the amount of sponsorship, so please indicate whether
you could attend with or without partial travel expenses or if you would be
interested in giving a talk via Skype or pre-recordings.

 For those who wish to publish, papers can be included in an e-publication
connected with the American Go Association web site and e-Journal (
www.usgo.org). Another change from the usual proceedings of a conference is
that (again for those who wish to) papers will be put up before the
Symposium which will enable better audience participation and the ability to
write more than is possible to include in the talk.

 Please contact: Peter Shotwell  pshotw...@gmail.com

 For reference, the beta website for the Symposium is at
http://www.gosymposium.org/  and the records of a similar ICOB Conference in
2003 is at http://www.usgo.org/yearbook/2003ICOB.html,  while a 2008
Symposium in Sweden is at http://egc2008.eu/en/events/symposium.php.

 We are currently reviewing our opportunities and options for sponsorship
and would appreciate any suggestions.

 *If you wrote me last spring, please do so again to let me know that you
are still interestedand that there were no miscommunications.*
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Multi-armed bandit problem theory

2011-10-26 Thread Brian Sheppard
I tried to find those papers once, too, and I also failed. It seems to
predate Internet publishing. Some key results were proved in the 1970's.

I think the following ideas sketch a proof:

- sample each arm infinitely often
- such that the fraction of effort going to sub-optimal arms
decreases to zero

Now look at the form of Hoeffding's inequality:
http://en.wikipedia.org/wiki/Hoeffding's_inequality. The UCB term is like
the functional inverse of the right-hand side. That is why it works for
bandit problems.

To go from bandit theory to UCT, you have to show that recursive application
of the same process converges, which is the proof that you see in the Auer
paper.

Note that there are solutions to the bandit problem that would not be
solutions for MCTS, because they do not converge recursively. E.g., you can
make an epsilon-greedy policy that works for a bandit problem (works ==
zero asymptotic regret), but would not converge to optimal in an MCTS
setting.

Hope this helps.

Brian

-Original Message-
From: computer-go-boun...@dvandva.org
[mailto:computer-go-boun...@dvandva.org] On Behalf Of Petr Baudis
Sent: Wednesday, October 26, 2011 8:51 AM
To: computer-go@dvandva.org
Subject: Re: [Computer-go] Multi-armed bandit problem theory

On Wed, Oct 26, 2011 at 01:56:03PM +0200, Ingo Althöfer wrote:
 Not a direct answer, but some bit of information:
 Bandit theory started in the early 1950' by Herbert Robbins
 (the same Robbins from the 1985 paper). However, he did
 not prove best possible bounds in the seminal paper.

Yes, I actually have a copy of that paper but it doesn't seem that it
could help me better understand the later results.

Petr Pasky Baudis
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] MCTS playouts per second

2011-10-26 Thread Hideki Kato
Zen has very slow (but very smart) heavy playout, about 1 kpo/s on a 3 
GHz Intel Core2 core.  One important tenuki for speed-up is to check 
the legality of moves _after_ selecting a move, i.e., don't check the 
legality of all moves before choosing a move.

Hideki

Scott Christensen: 
CAHuTwaxAW=44AXc6hhrtbYMxfJ079u=jxVjp=yrwcrb6j-v...@mail.gmail.com:
Just want to check what the expected playout performance is of well
tuned monte-carlo engines?  My MCTS engine is averaging apx 3,500
lightweight playouts per second on a single i5 32 bit cpu.  Any
suggestions on very efficient source code examples for fast
monte-carlo playouts?

I've spent a lot of time comparing recursive group formation vs
non-recursive but it doesn't seem to make a big difference.  It seems
that updating the list of likely moves after every play with something
similar to the mogo probability rules is the most time consuming part
as I currently recalculate the probabilities of moves at every empty
point on the board each turn. It seems necessary if one doesn't want
to handle all the exceptions to keeping the previous turn's play
probabilities.

Also any thoughts on combining pattern scoring and other conventional
techniques together with a UCT tree?   If two branches have very
similar simulated win ratios could one use other factors to choose the
best branch?  It seems if there is a very wide branching such as at
the beginning of the game, there is a lot of room to add other
heuristics to choosing the best move when monte-carlo scores are
within range of expected error.
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
-- 
Hideki Kato mailto:hideki_ka...@ybb.ne.jp
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] MCTS playouts per second

2011-10-26 Thread Aja Huang
On 19x19, Erica's speed is around 5,500 lightweight playouts per second on a 
single i7 cpu. As far as I know, Lukasz Lew's libego, which is open source, 
is the fastest implementation of MCTS and can reach around 6,000-7,000 
lightweight playouts per second in the same cpu.


Aja

-原始郵件- 
From: Scott Christensen

Sent: Wednesday, October 26, 2011 6:48 AM
To: computer-go@dvandva.org
Subject: [Computer-go] MCTS playouts per second

Just want to check what the expected playout performance is of well
tuned monte-carlo engines?  My MCTS engine is averaging apx 3,500
lightweight playouts per second on a single i5 32 bit cpu.  Any
suggestions on very efficient source code examples for fast
monte-carlo playouts?

I've spent a lot of time comparing recursive group formation vs
non-recursive but it doesn't seem to make a big difference.  It seems
that updating the list of likely moves after every play with something
similar to the mogo probability rules is the most time consuming part
as I currently recalculate the probabilities of moves at every empty
point on the board each turn. It seems necessary if one doesn't want
to handle all the exceptions to keeping the previous turn's play
probabilities.

Also any thoughts on combining pattern scoring and other conventional
techniques together with a UCT tree?   If two branches have very
similar simulated win ratios could one use other factors to choose the
best branch?  It seems if there is a very wide branching such as at
the beginning of the game, there is a lot of room to add other
heuristics to choosing the best move when monte-carlo scores are
within range of expected error.
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go 


___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] MCTS playouts per second

2011-10-26 Thread Brian Sheppard
Clarification: the numbers that I quoted on my previous post were on 9x9. 


-Original Message-
From: computer-go-boun...@dvandva.org
[mailto:computer-go-boun...@dvandva.org] On Behalf Of Aja Huang
Sent: Wednesday, October 26, 2011 10:23 PM
To: computer-go@dvandva.org
Subject: Re: [Computer-go] MCTS playouts per second

On 19x19, Erica's speed is around 5,500 lightweight playouts per second on a
single i7 cpu. As far as I know, Lukasz Lew's libego, which is open source,
is the fastest implementation of MCTS and can reach around 6,000-7,000
lightweight playouts per second in the same cpu.

Aja

--
From: Scott Christensen
Sent: Wednesday, October 26, 2011 6:48 AM
To: computer-go@dvandva.org
Subject: [Computer-go] MCTS playouts per second

Just want to check what the expected playout performance is of well tuned
monte-carlo engines?  My MCTS engine is averaging apx 3,500 lightweight
playouts per second on a single i5 32 bit cpu.  Any suggestions on very
efficient source code examples for fast monte-carlo playouts?

I've spent a lot of time comparing recursive group formation vs
non-recursive but it doesn't seem to make a big difference.  It seems that
updating the list of likely moves after every play with something similar to
the mogo probability rules is the most time consuming part as I currently
recalculate the probabilities of moves at every empty point on the board
each turn. It seems necessary if one doesn't want to handle all the
exceptions to keeping the previous turn's play probabilities.

Also any thoughts on combining pattern scoring and other conventional
techniques together with a UCT tree?   If two branches have very
similar simulated win ratios could one use other factors to choose the best
branch?  It seems if there is a very wide branching such as at the beginning
of the game, there is a lot of room to add other heuristics to choosing the
best move when monte-carlo scores are within range of expected error.
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go 

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


Re: [Computer-go] Computer-go Digest, Vol 21, Issue 16

2011-10-26 Thread Scott Christensen
Pachi C++ source code (http://pachi.or.cz/) does appears quite
straightforward, Fuego  libego are a bit more challenging.   I've
started by using ideas from Wally  (available through:
http://www.usgo.org/resources/computer.html) which is probably the
most compact and easy to read code I've seen but is not very
efficient.

Thanks Hideki, I am indeed doing full liberty checks on every move in
the list, I will change to only doing the checks when selected.
Exact probability of moves after the first 6-7 moves in the tree
doesn't seem that important really.

I'm only getting the 3.5k playouts/sec on 9x9, It would seem with apx
5k playouts on 19x19, for 9x9 we should around 50-100k playouts.
___
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go