[computer-go] Citation on zero exploration?

2009-11-09 Thread Peter Drake
Many of us have concluded that, with RAVE, there is no need for a UCT  
exploration term:


http://computer-go.org/pipermail/computer-go/2009-June/018773.html

Is there a published source on this result that I could cite?

Thanks,

Peter Drake
http://www.lclark.edu/~drake/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [SPAM] [computer-go] Citation on zero exploration?

2009-11-09 Thread Olivier Teytaud

 http://computer-go.org/pipermail/computer-go/2009-June/018773.html


As I've often said something related to that (e.g. in
http://hal.inria.fr/inria-00369786/fr/ ) I'd like to be more precise. What
follows is for binary deterministic games, and I precise at the beginning
that this is not intended to make programs much stronger.

UCT (with exploration) is consistent, in the sense that asymptotically it
will find
an optimal move, if any winning move exists.

With no exploration, this is no more true, even with rave: there are cases
in which rave will only focus on one move and will simulate only this one,
in spite of a poor winning rate. However, a simple trick recovers
consistency:
use (nbWins+K)/(nbSimulations+2K) for example, or other regularization
tricks, if the weight of Rave decreases to 0 when the number of simulations
goes to infinity. However, if you have patterns with negative weights and
the score is biased by these patterns (possibly until a negative value),
then you loose consistency whenever the weight of the patterns decrease to 0
when the number of simulations of the corresponding move goes to 0.

Then, we might feel that we need more than consistency: we want consistency,
and we do not want to simulate all the tree infinitely often. This can be
done
with some specific rules, and in mogo we have added a simple patch which,
roughly, consists in simulating randomly the children nodes when the success
rate of a node is very low (below 30%). This ensures that the following
rules are enough for ensuring consistency:

   score(move) - winRate(move) -- 0 as nbSims(move) --
infinity

as we optimize automatically patterns and parameters, the advantage is that
we can do it without any risk of loosing consistency.

More precisely, http://www.lri.fr/~teytaud/consistency.pdf (submitted; not a
lot of numerical experiments, it's for theory mainly - yet, for 500 000
simulations per move, there is a significant difference, and for a specific
position, it works well - there was absolutely no tuning for this).

I don't claim this will make a big change in Monte-Carlo Tree Search - just
a bit of theoretical understanding, and the pleasure of asymptotic
consistency - maybe it will make a real difference for very long thinking
time for which we can't use tuning due to the computational cost  :-)
and it's the first time I have more than 50% for a modification derived by
theoretical ideas :-)

Best regards,
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Citation on zero exploration?

2009-11-09 Thread Cenny Wenner
By result, do you mean this observation or a quest for an explanation?
If you merely wish to say that many/most current UCT programs have no
need for an exploration term, then that is a context-specific (e.g.
not for the E-E in Go paper) heuristic or experimental statement,
not a formal one. A source for such a statement has to be more than a
paper that simply notices a similar effect for their own application.
One would have to reference a larger body of experimentalists or a
general consensus.

Just my humble opinion,
Cenny Wenner


On 11/9/09, Peter Drake dr...@lclark.edu wrote:
 Many of us have concluded that, with RAVE, there is no need for a UCT
 exploration term:

 http://computer-go.org/pipermail/computer-go/2009-June/018773.html

 Is there a published source on this result that I could cite?

 Thanks,

 Peter Drake
 http://www.lclark.edu/~drake/



 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [SPAM] Re: [computer-go] Citation on zero exploration?

2009-11-09 Thread Peter Drake
I'm actually looking for something weaker than what Olivier has  
offered: a published report of the empirical finding that (for some  
programs, at least) an exploration coefficient of zero works best.


Peter Drake
http://www.lclark.edu/~drake/



On Nov 9, 2009, at 10:15 AM, Olivier Teytaud wrote:


Hi; I'd like to answer your post but I must admit I've
not clearly understood.

My PDF file is essentially a mathematical analysis, proving that we  
can
have consistency with some rules, without having infinitely many  
visits
of the whole tree. UCT has the first property (consistency), but not  
the second
(UCT visits all the tree infinitely often). This is proved under  
clearly stated assumptions on the problem; including deterministic  
two-player zero-sum games,

and therefore including Go.

Best regards,
Olivier


By result, do you mean this observation or a quest for an explanation?
If you merely wish to say that many/most current UCT programs have no
need for an exploration term, then that is a context-specific (e.g.
not for the E-E in Go paper) heuristic or experimental statement,
not a formal one. A source for such a statement has to be more than a
paper that simply notices a similar effect for their own application.
One would have to reference a larger body of experimentalists or a
general consensus.

Just my humble opinion,
Cenny Wenner


On 11/9/09, Peter Drake dr...@lclark.edu wrote:
 Many of us have concluded that, with RAVE, there is no need for a  
UCT

 exploration term:

 http://computer-go.org/pipermail/computer-go/2009-June/018773.html

 Is there a published source on this result that I could cite?

 Thanks,

 Peter Drake
 http://www.lclark.edu/~drake/



 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



--
=
Olivier Teytaud (TAO-inria) olivier.teyt...@inria.fr
Tel (33)169154231 / Fax (33)169156586
Equipe TAO (Inria-Futurs), LRI, UMR 8623(CNRS - Universite Paris-Sud),
bat 490 Universite Paris-Sud 91405 Orsay Cedex France
(one of the 56.5 % of french who did not vote for Sarkozy in 2007)


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [SPAM] Re: [SPAM] Re: [computer-go] Citation on zero exploration?

2009-11-09 Thread Olivier Teytaud
I don't know if a post in the computer-go mailing list is a report, but you
can find numbers in this post:
http://computer-go.org/pipermail/computer-go/2008-May/014854.html

From the numbers I would say that it shows that all sufficiently small
constants
are equivalent - maybe more experiments would be interesting.
Olivier

I'm actually looking for something weaker than what Olivier has offered: a
 published report of the empirical finding that (for some programs, at least)
 an exploration coefficient of zero works best.

 Peter Drake
 http://www.lclark.edu/~drake/ http://www.lclark.edu/%7Edrake/



 On Nov 9, 2009, at 10:15 AM, Olivier Teytaud wrote:

 Hi; I'd like to answer your post but I must admit I've
 not clearly understood.

 My PDF file is essentially a mathematical analysis, proving that we can
 have consistency with some rules, without having infinitely many visits
 of the whole tree. UCT has the first property (consistency), but not the
 second
 (UCT visits all the tree infinitely often). This is proved under clearly
 stated assumptions on the problem; including deterministic two-player
 zero-sum games,
 and therefore including Go.

 Best regards,
 Olivier


 By result, do you mean this observation or a quest for an explanation?
 If you merely wish to say that many/most current UCT programs have no
 need for an exploration term, then that is a context-specific (e.g.
 not for the E-E in Go paper) heuristic or experimental statement,
 not a formal one. A source for such a statement has to be more than a
 paper that simply notices a similar effect for their own application.
 One would have to reference a larger body of experimentalists or a
 general consensus.

 Just my humble opinion,
 Cenny Wenner


 On 11/9/09, Peter Drake dr...@lclark.edu wrote:
  Many of us have concluded that, with RAVE, there is no need for a UCT
  exploration term:
 
  http://computer-go.org/pipermail/computer-go/2009-June/018773.html
 
  Is there a published source on this result that I could cite?
 
  Thanks,
 
  Peter Drake
  http://www.lclark.edu/~drake/ http://www.lclark.edu/%7Edrake/
 
 
 
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/




 --
 =
 Olivier Teytaud (TAO-inria) olivier.teyt...@inria.fr
 Tel (33)169154231 / Fax (33)169156586
 Equipe TAO (Inria-Futurs), LRI, UMR 8623(CNRS - Universite Paris-Sud),
 bat 490 Universite Paris-Sud 91405 Orsay Cedex France
 (one of the 56.5 % of french who did not vote for Sarkozy in 2007)


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/



 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/




-- 
=
Olivier Teytaud (TAO-inria) olivier.teyt...@inria.fr
Tel (33)169154231 / Fax (33)169156586
Equipe TAO (Inria-Futurs), LRI, UMR 8623(CNRS - Universite Paris-Sud),
bat 490 Universite Paris-Sud 91405 Orsay Cedex France
(one of the 56.5 % of french who did not vote for Sarkozy in 2007)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [SPAM] Re: [computer-go] Citation on zero exploration?

2009-11-09 Thread Petr Baudis
On Mon, Nov 09, 2009 at 10:18:25AM -0800, Peter Drake wrote:
 I'm actually looking for something weaker than what Olivier has
 offered: a published report of the empirical finding that (for some
 programs, at least) an exploration coefficient of zero works best.

I think you could use Combining expert, offline, transient and online
knowledge in Monte-Carlo exploration for that, since there is presented
an AMAF equation without any exploration term, and the final equation
has no exploration term either.

-- 
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [SPAM] Re: [computer-go] Citation on zero exploration?

2009-11-09 Thread Cenny Wenner
My mistake. The comment was directed to the original post and not
yours. I was being too slow writing a reply.

(yikes, i really dislike the formatting of the
http://hal.inria.fr/inria-00369786/fr article)

On 11/9/09, Olivier Teytaud olivier.teyt...@lri.fr wrote:
 Hi; I'd like to answer your post but I must admit I've
 not clearly understood.

 My PDF file is essentially a mathematical analysis, proving that we can
 have consistency with some rules, without having infinitely many visits
 of the whole tree. UCT has the first property (consistency), but not the
 second
 (UCT visits all the tree infinitely often). This is proved under clearly
 stated assumptions on the problem; including deterministic two-player
 zero-sum games,
 and therefore including Go.

 Best regards,
 Olivier


 By result, do you mean this observation or a quest for an explanation?
 If you merely wish to say that many/most current UCT programs have no
 need for an exploration term, then that is a context-specific (e.g.
 not for the E-E in Go paper) heuristic or experimental statement,
 not a formal one. A source for such a statement has to be more than a
 paper that simply notices a similar effect for their own application.
 One would have to reference a larger body of experimentalists or a
 general consensus.

 Just my humble opinion,
 Cenny Wenner


 On 11/9/09, Peter Drake dr...@lclark.edu wrote:
  Many of us have concluded that, with RAVE, there is no need for a UCT
  exploration term:
 
  http://computer-go.org/pipermail/computer-go/2009-June/018773.html
 
  Is there a published source on this result that I could cite?
 
  Thanks,
 
  Peter Drake
  http://www.lclark.edu/~drake/ http://www.lclark.edu/%7Edrake/
 
 
 
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/




 --
 =
 Olivier Teytaud (TAO-inria) olivier.teyt...@inria.fr
 Tel (33)169154231 / Fax (33)169156586
 Equipe TAO (Inria-Futurs), LRI, UMR 8623(CNRS - Universite Paris-Sud),
 bat 490 Universite Paris-Sud 91405 Orsay Cedex France
 (one of the 56.5 % of french who did not vote for Sarkozy in 2007)

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [SPAM] Re: [computer-go] Citation on zero exploration?

2009-11-09 Thread Petr Baudis
On Mon, Nov 09, 2009 at 07:42:46PM +0100, Cenny Wenner wrote:
 My mistake. The comment was directed to the original post and not
 yours. I was being too slow writing a reply.
 
 (yikes, i really dislike the formatting of the
 http://hal.inria.fr/inria-00369786/fr article)

Which, incidentally, also is a publication mentioning c=0 favourableness. :-)

-- 
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [SPAM] Re: [computer-go] Citation on zero exploration?

2009-11-09 Thread Peter Drake

Perfect!

The very similar paper (by most of the same authors) Adding expert  
knowledge and exploration in Monte-Carlo Tree Search contains the key  
passage:


In MoGo, the constant in front of the exploration term was not null  
before the introduction of RAVE values in [10]; it is now 0.


Peter Drake
http://www.lclark.edu/~drake/



On Nov 9, 2009, at 10:31 AM, Petr Baudis wrote:


On Mon, Nov 09, 2009 at 10:18:25AM -0800, Peter Drake wrote:

I'm actually looking for something weaker than what Olivier has
offered: a published report of the empirical finding that (for some
programs, at least) an exploration coefficient of zero works best.


I think you could use Combining expert, offline, transient and online
knowledge in Monte-Carlo exploration for that, since there is  
presented

an AMAF equation without any exploration term, and the final equation
has no exploration term either.

--
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/