subject:"\[computer\-go\] How to properly implement RAVE\?"

Re: [computer-go] How to properly implement RAVE?

2009-02-08 Thread Isaac Deutsch

No hurry, I will be away for a week next week. :-)


 Isaac,
 
 I haven't looked at this stuff for a while. I'm not at home so I can't
 look at it now.
 
 From the error I understand that tesuji/games/general/MoveIterator is
 missing. It is there in the Subversion source-tree however. So I don't
 know what could be wrong. Maybe it's somehow missing in the
 GoEngineGTP.jar
 
 If you can wait until Wednesday I'll fix it for you then.
 
 Mark

-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-08 Thread Petr Baudis

On Sat, Jan 17, 2009 at 08:29:32PM +0100, Sylvain Gelly wrote:
 A small point: in PlayoutOutTree, just after if
 (!played.AlreadyPlayed(move)) {, there should have a played.Play(move).
 I believe it does not change the final result (as the check is also done in
 the backup, and the move played in the backup), but I simply forgot that
 line (that should make moves_played_out_tree smaller).
 
 To avoid confusion, I repost the pseudo code with that correction (and
 hoping the indentation is not broken by the email editor once again).

Thank you so much for this! I have switched my RAVE implementation to
this formula and the bot has gotten noticeably stronger, though I
apparently still have some bugs to chase, since it seems to have trouble
considering strongest opponent's responses and frequently focuses on
unreasonable opponent's replies instead of the obvious (e.g. keeping a
group of stones in atari). Maybe I need better prior hinting...

I have few questions. Of course, please feel free to skip questions
about particular constants if you feel that's giving away too much. :-)

 ChooseMove(node, board) {
   bias = 0.015  // I put a random number here, to be tuned
   b = bias * bias / 0.25

Maybe it would be cleaner to define b = 1 / rave_equiv, where rave_equiv
is the number of playouts RAVE is thought to be equivalent of? Or is the
meaning of this constant actually different?

What value works best for people? I did not do much tuning yet, but I
use b=1/3000. I see Fuego uses b=1/5000. (This example b=1/.)

   best_value = -1
   best_move = PASSMOVE
   for (move in board.allmoves) {
 c = node.child(move).counts
 w = node.child(move).wins
 rc = node.rave_counts[move]
 rw = node.rave_wins[move]
 coefficient = 1 - rc / (rc + c + rc * c * b)
 value = w / c * coef + rw / rc * (1 - coef)  // please here take care of
 the c==0 and rc == 0 cases
 if (value  best_value) {
   best_value = value
   best_move = move
 }
   }
   return best_move
 }

I have two questions here:

* Is the FPU concept abandoned? Or what values are reasonable? It seems
  to me 1.0, which is usually recommended, is obviously too big here
  since that's the upper bound of the value already. So far I have tried
  0.6 and 0.7 but both just make my bot slightly weaker.

* How to accomodate prior knowledge? (I'm using grand-parent heuristics,
  atari liberties, and few patterns.) Do you use it to fill normal
  counts, RAVE values or both? What count values work best for you?
  I have settled on 50 playouts.

-- 
Petr Pasky Baudis
The average, healthy, well-adjusted adult gets up at seven-thirty
in the morning feeling just terrible. -- Jean Kerr
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-08 Thread Mark Boon

I had a little spare time to look at it. It seems indeed I forgot to  
update the GoEngineGTP.jar file last time I made some changes. This  
was easy to fix even from here and I think it should work now.


Just as a note, if you want to change the number of playouts to 50K,  
you need to change the minimumNrNodes from 4,000 to 50,000 in the  
ReferenceMonteCarloTreeSearch definition in the TesujiRefBot.xml. But  
you may also want to increase the 'nrSimulationsBeforeExpansion' value  
to something higher like 8 or even 32. It depends on how much memory  
you have available. You may want to set it to the same value you use  
for your own program to get a good comparison anyway. Use the  
MCTSRefBot to play, which means you'll be passing the following  
command to two-gtp: java -xmx512M -jar GoEngineGTP.jar MCTSRefBot


Adjust the -xmx???M to whatever you think is suitable on your  
computer. I think if you set nrSimulationsBeforeExpansion=8 then 512M  
should be enough but you may have to experiment a little to find the  
optimal settings.


Mark


On Feb 8, 2009, at 7:06 AM, Isaac Deutsch wrote:


No hurry, I will be away for a week next week. :-)



Isaac,

I haven't looked at this stuff for a while. I'm not at home so I  
can't

look at it now.

From the error I understand that tesuji/games/general/ 
MoveIterator is
missing. It is there in the Subversion source-tree however. So I  
don't

know what could be wrong. Maybe it's somehow missing in the
GoEngineGTP.jar

If you can wait until Wednesday I'll fix it for you then.

Mark


--
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit  
allen: http://www.gmx.net/de/go/multimessenger01

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-08 Thread Isaac Deutsch

Hi,

Can you explain what minimumNrNodes and nrSimulations do? In my program I
play 50k games regardless of the number of nodes, so I would like to adjust
this accordingly.

Otherwise, it works now. :)

-ibd


 Original-Nachricht 
 Datum: Sun, 8 Feb 2009 14:47:55 -0200
 Von: Mark Boon tesujisoftw...@gmail.com
 An: computer-go computer-go@computer-go.org
 Betreff: Re: [computer-go] How to properly implement RAVE?

 I had a little spare time to look at it. It seems indeed I forgot to  
 update the GoEngineGTP.jar file last time I made some changes. This  
 was easy to fix even from here and I think it should work now.
 
 Just as a note, if you want to change the number of playouts to 50K,  
 you need to change the minimumNrNodes from 4,000 to 50,000 in the  
 ReferenceMonteCarloTreeSearch definition in the TesujiRefBot.xml. But  
 you may also want to increase the 'nrSimulationsBeforeExpansion' value  
 to something higher like 8 or even 32. It depends on how much memory  
 you have available. You may want to set it to the same value you use  
 for your own program to get a good comparison anyway. Use the  
 MCTSRefBot to play, which means you'll be passing the following  
 command to two-gtp: java -xmx512M -jar GoEngineGTP.jar MCTSRefBot
 
 Adjust the -xmx???M to whatever you think is suitable on your  
 computer. I think if you set nrSimulationsBeforeExpansion=8 then 512M  
 should be enough but you may have to experiment a little to find the  
 optimal settings.
 
 Mark
 
 
 On Feb 8, 2009, at 7:06 AM, Isaac Deutsch wrote:
 
  No hurry, I will be away for a week next week. :-)
 
 
  Isaac,
 
  I haven't looked at this stuff for a while. I'm not at home so I  
  can't
  look at it now.
 
  From the error I understand that tesuji/games/general/ 
  MoveIterator is
  missing. It is there in the Subversion source-tree however. So I  
  don't
  know what could be wrong. Maybe it's somehow missing in the
  GoEngineGTP.jar
 
  If you can wait until Wednesday I'll fix it for you then.
 
  Mark
 
  -- 
  Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit  
  allen: http://www.gmx.net/de/go/multimessenger01
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 

-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-08 Thread Jason House

On Fri, 2009-02-06 at 18:55 +0100, Isaac Deutsch wrote:
 The rating of the bot still seems to be drifting upwards, but I think I can
 conclude my UCT implementation is OK afterall. Many thanks to the bots
 provided. Does someone have a bot that does 50k light playouts + RAVE? I
 would be most grateful if you could put them online for a few days of
 testing. :-)
 
 By the way, I've seen 2 games when checking my bot's status where one of the
 myCtest bots lost because of an illegal ko move. Maybe there's a bug in
 handling superko?
 
 Regards,
 Isaac

hb-0.7-RAVE-50k is now online. It should be identical to hb-797-RAVE-50k
(Please disregard hb-0.7-50k-RAVE since it had a serious bug)

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-08 Thread Mark Boon

On Sun, Feb 8, 2009 at 3:02 PM, Isaac Deutsch i...@gmx.ch wrote:
 Hi,

 Can you explain what minimumNrNodes and nrSimulations do? In my program I
 play 50k games regardless of the number of nodes, so I would like to adjust
 this accordingly.


minimumNrNodes is the number of games played out. Originally I used to
always create a new node when a playout happened. Maybe this should be
renamed to minimumNrPlayouts. nrSimulationsBeforeExpansion is the
minimum number of visits that have to be made to a node before the
tree gets expanded any deeper. As an example, when the search begins,
the root-node is expanded with all possible legal moves. But those
children nodes are not expanded themselves until a certain number of
simulations (playouts) have been done starting from the root-node.
Until that time the children nodes are only used to store the AMAF
values. I think this may be slightly different from how most people do
it, but I figured it would be a waste to throw away the AMAF values
for the first 'n' simulations in a node.

 Otherwise, it works now. :)

Good.

Mark
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-07 Thread Isaac Deutsch


 I'm currently tied up but you can get my MCTS implementation, which
 includes RAVE, and set it up to play 50K playouts. It's only a matter
 of setting the right number in the configuration file.
 
 You can also use it to play through two-gtp, that way you can test an
 awful lot faster.
 
 Mark

Hi Mark,

This would be awesome. Can you send the source code to this eMail address?

Thanks, ibd
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-07 Thread Isaac Deutsch

I put all files in a folder like so:

http://img21.imageshack.us/img21/5272/bild4ya1.png

But I get various errors when I try to run, for example, GoEngineGTP.jar
with java -jar ..., also I'm not able to select the RefBot as player.
I'm not sure what the others do, but one seems to connect to the 19x19 CGOS
server. I don't have any experience with java. Any ideas?

Isaac

The beginning of the error:
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'TesujiRefBot' defined in file [/Users/ibd/Desktop/computer 
go/Rango/CGOS/TesujiRefBot.xml]: Cannot resolve reference to bean 
'referenceLibertyAdministration' while setting bean property 
'monteCarloAdministration'; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'referenceLibertyAdministration' defined in file 
[/Users/ibd/Desktop/computer go/Rango/CGOS/TesujiRefBot.xml]: Instantiation of 
bean failed; nested exception is java.lang.NoClassDefFoundError: 
tesuji/games/general/MoveIterator


 Original-Nachricht 
 Datum: Sat, 7 Feb 2009 09:30:53 -0200
 Von: Mark Boon tesujisoftw...@gmail.com
 An: computer-go computer-go@computer-go.org
 Betreff: Re: [computer-go] How to properly implement RAVE?

 You can get everything here:
 
 http://plug-and-go.dev.java.net
 
 The MCTS program description is under 'Derived Projects'.
 
 You don't really need the source-code. You can just get the 'binaries
  scripts' and then copy the files 'TesujiRefBot.jar' and
 'TesujiRefBot.xml' to the directory where you put the binaries and
 everything works automagically. It's all plug and go. You may want to
 change the number of nodes to read in the XML file to 50K (it's called
 minimumNrNodes).
 
 Of course if you prefer to get the source you are free to do so.
 
 I do remember making a significant fix a little while ago that I don't
 think I submitted yet. But I won't be able to commit that for a few
 more days as I'm not at home. But if you implemented RAVE correctly
 your bot should do at least as well as this one.
 
 Hopefully it's any help.
 
 Mark
 
 
 On Sat, Feb 7, 2009 at 7:18 AM, Isaac Deutsch i...@gmx.ch wrote:
 
  I'm currently tied up but you can get my MCTS implementation, which
  includes RAVE, and set it up to play 50K playouts. It's only a matter
  of setting the right number in the configuration file.
 
  You can also use it to play through two-gtp, that way you can test an
  awful lot faster.
 
  Mark
 
  Hi Mark,
 
  This would be awesome. Can you send the source code to this eMail
 address?
 
  Thanks, ibd
  --
  Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit
 allen: http://www.gmx.net/de/go/multimessenger01
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-07 Thread Mark Boon

Isaac,

I haven't looked at this stuff for a while. I'm not at home so I can't
look at it now.

From the error I understand that tesuji/games/general/MoveIterator is
missing. It is there in the Subversion source-tree however. So I don't
know what could be wrong. Maybe it's somehow missing in the
GoEngineGTP.jar

If you can wait until Wednesday I'll fix it for you then.

Mark


On Sat, Feb 7, 2009 at 4:29 PM, Isaac Deutsch i...@gmx.ch wrote:
 I put all files in a folder like so:

 http://img21.imageshack.us/img21/5272/bild4ya1.png

 But I get various errors when I try to run, for example, GoEngineGTP.jar
 with java -jar ..., also I'm not able to select the RefBot as player.
 I'm not sure what the others do, but one seems to connect to the 19x19 CGOS
 server. I don't have any experience with java. Any ideas?

 Isaac

 The beginning of the error:
 org.springframework.beans.factory.BeanCreationException: Error creating bean 
 with name 'TesujiRefBot' defined in file [/Users/ibd/Desktop/computer 
 go/Rango/CGOS/TesujiRefBot.xml]: Cannot resolve reference to bean 
 'referenceLibertyAdministration' while setting bean property 
 'monteCarloAdministration'; nested exception is 
 org.springframework.beans.factory.BeanCreationException: Error creating bean 
 with name 'referenceLibertyAdministration' defined in file 
 [/Users/ibd/Desktop/computer go/Rango/CGOS/TesujiRefBot.xml]: Instantiation 
 of bean failed; nested exception is java.lang.NoClassDefFoundError: 
 tesuji/games/general/MoveIterator


  Original-Nachricht 
 Datum: Sat, 7 Feb 2009 09:30:53 -0200
 Von: Mark Boon tesujisoftw...@gmail.com
 An: computer-go computer-go@computer-go.org
 Betreff: Re: [computer-go] How to properly implement RAVE?

 You can get everything here:

 http://plug-and-go.dev.java.net

 The MCTS program description is under 'Derived Projects'.

 You don't really need the source-code. You can just get the 'binaries
  scripts' and then copy the files 'TesujiRefBot.jar' and
 'TesujiRefBot.xml' to the directory where you put the binaries and
 everything works automagically. It's all plug and go. You may want to
 change the number of nodes to read in the XML file to 50K (it's called
 minimumNrNodes).

 Of course if you prefer to get the source you are free to do so.

 I do remember making a significant fix a little while ago that I don't
 think I submitted yet. But I won't be able to commit that for a few
 more days as I'm not at home. But if you implemented RAVE correctly
 your bot should do at least as well as this one.

 Hopefully it's any help.

 Mark


 On Sat, Feb 7, 2009 at 7:18 AM, Isaac Deutsch i...@gmx.ch wrote:
 
  I'm currently tied up but you can get my MCTS implementation, which
  includes RAVE, and set it up to play 50K playouts. It's only a matter
  of setting the right number in the configuration file.
 
  You can also use it to play through two-gtp, that way you can test an
  awful lot faster.
 
  Mark
 
  Hi Mark,
 
  This would be awesome. Can you send the source code to this eMail
 address?
 
  Thanks, ibd
  --
  Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit
 allen: http://www.gmx.net/de/go/multimessenger01
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

 --
 Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
 http://www.gmx.net/de/go/multimessenger01
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-06 Thread Isaac Deutsch

The rating of the bot still seems to be drifting upwards, but I think I can
conclude my UCT implementation is OK afterall. Many thanks to the bots
provided. Does someone have a bot that does 50k light playouts + RAVE? I
would be most grateful if you could put them online for a few days of
testing. :-)

By the way, I've seen 2 games when checking my bot's status where one of the
myCtest bots lost because of an illegal ko move. Maybe there's a bug in
handling superko?

Regards,
Isaac
-- 
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL 
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-06 Thread Christoph Birk



On Feb 6, 2009, at 9:55 AM, Isaac Deutsch wrote:
By the way, I've seen 2 games when checking my bot's status where  
one of the
myCtest bots lost because of an illegal ko move. Maybe there's a  
bug in

handling superko?


Not a bug, I never implemented it :-(

Christoph

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-06 Thread Jason House


On Feb 6, 2009, at 12:55 PM, Isaac Deutsch i...@gmx.ch wrote:

The rating of the bot still seems to be drifting upwards, but I  
think I can

conclude my UCT implementation is OK afterall. Many thanks to the bots
provided. Does someone have a bot that does 50k light playouts +  
RAVE? I

would be most grateful if you could put them online for a few days of
testing. :-)


I do, and I will. I'm just slow... I reimaged my machine a while back...



By the way, I've seen 2 games when checking my bot's status where  
one of the
myCtest bots lost because of an illegal ko move. Maybe there's a  
bug in

handling superko?

Regards,
Isaac
--
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T45 
69a

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-06 Thread Mark Boon

I'm currently tied up but you can get my MCTS implementation, which
includes RAVE, and set it up to play 50K playouts. It's only a matter
of setting the right number in the configuration file.

You can also use it to play through two-gtp, that way you can test an
awful lot faster.

Mark


On Fri, Feb 6, 2009 at 3:55 PM, Isaac Deutsch i...@gmx.ch wrote:
 The rating of the bot still seems to be drifting upwards, but I think I can
 conclude my UCT implementation is OK afterall. Many thanks to the bots
 provided. Does someone have a bot that does 50k light playouts + RAVE? I
 would be most grateful if you could put them online for a few days of
 testing. :-)

 By the way, I've seen 2 games when checking my bot's status where one of the
 myCtest bots lost because of an illegal ko move. Maybe there's a bug in
 handling superko?

 Regards,
 Isaac
 --
 Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
 für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-05 Thread Isaac Deutsch

Thanks Christoph.
I've changed my bot to play 50k games. I know the results aren't very
statistically meaningful yet, but the 2 bots look close to each other
already. ;-) I will let it run for some more days if I can.

http://img145.imageshack.us/img145/3586/bild3bk3.png

-ibd

 
 I have started the following versions of 'myCtest':
 
   myCtest-50k-UCT  (Bayes-ELO= 1618+-14)
   myCtest-10k-AMAF (Bayes-ELO= 1481+-14)
   myCtest-10k-UCT  (Bayes-ELO= 1334+-12)
 
 Christoph

-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-03 Thread Isaac Deutsch

I actually tried leaf parallelization first, but after reading the
mentioned paper I switched to an implementation of root parallelization (as
described). I'm not sure if I implemented it correctly (like in my
description), but after testing a 2-core-version against a single-
core-version with a few games, I can say the dual core version wins at
least 75% of the games, which I think would be an ELO difference of about
200. I have yet to switch colors but the results should be similar.

-ibd


 There were a couple of papers [2] at CG2008 on this subject. The
 consensus seemed to be that root parallelization [1] was best. In fact
 Guillaume Chaslot's version got a strength speedup of 3.0 using 2
 threads, and 6.5 using 4 threads (dropping to 14.9 with 16 threads).
 This is of course impossible, and implies the parallel version is
 somehow doing MCTS better than the single thread algorithm!
 
 Darren
 
 [1]: Build multiple MCTS trees in parallel, one per thread. No
 communication. At end add the trees together and use grand total to
 select move.
 
 [2]:
 http://www.cs.unimaas.nl/g.chaslot/papers/parallelMCTS.pdf
 and
 A Parallel Monte-Carlo Tree Search Algorithm by Tristan Cazenave (I
 couldn't seem to find a PDF link.)

-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-03 Thread Jason House

On Sun, Feb 1, 2009 at 4:46 PM, Isaac Deutsch i...@gmx.ch wrote:

 By the way, I got about 75 ELO points (1650-1720) with light playouts out
 of RAVE. Do you think this is in the expected range? It's not really similar
 to the 20%-60% win rate rise vs. GnuGo described in some papers...



My bot is about 1/4-1/2 of the speed of yours, but here are my strength
numbers (median of reported bayeselo numbers below):
HouseBot Rango_004
RAVE - 1778 ELO 1721 ELO
UCT- 1587 ELO 1642 ELO

Given the tremendous speed difference between our bots, I suspect you may
have some debugging to do :(  I'll try to put my bot up on CGOS again in the
next few days.  Maybe some head to head games will best answer how our
implementations compare to each other.

HouseBot parameters:
   One search thread
   Max 50k playouts per move (time control may reduce)
   No pondering
   Variants with RAVE in the name: UCT+RAVE
   Variants with UCT in the name: UCT without RAVE


hb-797-RAVE-50k 1826
hb-791-RAVE-50k 1778
hb-819-RAVE-50k 1715

hb-799-UCT-50k 1606
hb-797-UCT-50k 1605
hb-770-UCT-50k 1570
hb-791-UCT-50k 1556
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-03 Thread Jason House

On Tue, Feb 3, 2009 at 1:53 PM, Isaac Deutsch i...@gmx.ch wrote:

Hi Jason,

Thanks for your numbers. I might try to limit my bot to 50k playouts and 1
core, but I usually simulate as long as time permits.

That kind of setup should make it easier to compare. There have been a few
times in the past where multiple authors posted similar bots with the same
configurations and let them all duke it out on the server for a while. Once
upon a time, I was the owner of a bot that was performing far worse than
yours and trying to figure out why. I forget who owns myctest, but they
were very willing to run a bot to help me out.

Do you suspect my
pure UCT version has bugs, too, judging from its rating?

Maybe, but it's not all that likely. It could be that your RAVE
implementation isn't buggy either. IIRC, when I was running my bots, there
were very few bots in the 1600-2100 ELO range. That may have skewed my
results.

I find it hard to
find good tests for the correctness of a program depending on randomness,
and even harder to find bugs.

I created a bunch of unit tests to test very specific cases with my search.
Looking at
http://housebot.svn.sourceforge.net/viewvc/housebot/branches/0.7/search/uct.d?view=markupthose
tests are as follows:

- (line 743) Exploitation of perfect children - Two children with 100%
winning rate. Run 100 simulations and ensure that only one child was
explored.
- (line 767) Evaluation of foced sequences - Commented out. I never
added that functionality (Auto-expand when only one follow-up move is
available)
- (line 777) Recognition of a lost position - The end of the game is one
move away, and the game is lost. Ensure that each leaf is evaluated once
and that the search stops and declares the position as lost
- (line 793) Recognition of a won position - The end of the game is one
move away and the game is won. Ensure that only one leaf is evaluated and
that the search stops and declares the position as won
- (line 809) Two ply search of a won position
- (line 834) Searches with solved children (some won, some lost)
- (line 862) Searches where a winning subtree gets solved through a
transposed position - Multiple paths exist to the same subtree. Force the
evaluation through one path to complete and then force the evaluation
through another path and ensure the conclusion is picked up
- (line 894) Searches where a losing subtree gets solved through a
transposed position
- (line 924) Searches where a losing terminal gets solved through a
transposed position

I focused most of the tests on terminal positions that were easier to
understand how they should turn out. It also had the nice side effect of
helping me speed up my endgame handling in addition to helping me find
several bugs that had been plaguing my bot.

Maybe we can set up our bots to play under
similar circumstances on CGOS.

Yes, I'll set it up soon (but probably not today).

Regards,
Isaac

My bot is about 1/4-1/2 of the speed of yours, but here are my strength
numbers (median of reported bayeselo numbers below):
HouseBot Rango_004
RAVE - 1778 ELO 1721 ELO
UCT- 1587 ELO 1642 ELO

Given the tremendous speed difference between our bots, I suspect you may
have some debugging to do :( I'll try to put my bot up on CGOS again in
the
next few days. Maybe some head to head games will best answer how our
implementations compare to each other.

--
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-03 Thread Isaac Deutsch

Hi Jason,

Thanks for your numbers. I might try to limit my bot to 50k playouts and 1
core, but I usually simulate as long as time permits. Do you suspect my
pure UCT version has bugs, too, judging from its rating? I find it hard to
find good tests for the correctness of a program depending on randomness,
and even harder to find bugs. Maybe we can set up our bots to play under
similar circumstances on CGOS.

Regards,
Isaac

 My bot is about 1/4-1/2 of the speed of yours, but here are my strength
 numbers (median of reported bayeselo numbers below):
 HouseBot Rango_004
 RAVE - 1778 ELO 1721 ELO
 UCT- 1587 ELO 1642 ELO
 
 Given the tremendous speed difference between our bots, I suspect you may
 have some debugging to do :(  I'll try to put my bot up on CGOS again in
 the
 next few days.  Maybe some head to head games will best answer how our
 implementations compare to each other.


-- 
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL 
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Jason House


On Feb 2, 2009, at 6:57 AM, Isaac Deutsch i...@gmx.ch wrote:




Hi Issac,
You should be more in the range of +200-300 ELO, at least with  
pattern

based
playouts.

Sylvain


Isaac. They are not pattern based playouts, but as I said uniformly  
random.

I reckon the effect of RAVE is less with these?

How many playouts per second do you get with each version?

Actually, to make comparable tests with both versions, the version  
without
RAVE just sets the coefficient of RAVE in the UCT-RAVE value  
calculation to
zero. I get about 41k games/s on 9x9 (using 2 cores, about 20k/s/ 
core).
The playouts are fairly optimized but the tree search isn't at all  
yet, so

there is still some potential.


My first (braindead) multi-threaded UCT played weaker with two cores  
than one core. How do you combine search trees/results? How do you  
pick a move to play?


Single-threaded RAVE with no parameter tuning, an ancient RAVE  
formula, and 10k light playouts per second got me into the 1600's on  
CGOS. I used Don Dailey's AMAF methodology for RAVE (first color to  
play on a point, keep 7/8 of move list).


Also, I noticed your rank measurements were based on CGOS results  
after relatively few games. It can retain significant bias for quite a  
while.





--
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T45 
69a

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Jason House

On Feb 2, 2009, at 9:40 AM, Jason House jason.james.ho...@gmail.com  
wrote:



On Feb 2, 2009, at 6:57 AM, Isaac Deutsch i...@gmx.ch wrote:




Hi Issac,
You should be more in the range of +200-300 ELO, at least with  
pattern

based
playouts.

Sylvain


Isaac. They are not pattern based playouts, but as I said uniformly  
random.

I reckon the effect of RAVE is less with these?

How many playouts per second do you get with each version?

Actually, to make comparable tests with both versions, the version  
without
RAVE just sets the coefficient of RAVE in the UCT-RAVE value  
calculation to
zero. I get about 41k games/s on 9x9 (using 2 cores, about 20k/s/ 
core).
The playouts are fairly optimized but the tree search isn't at all  
yet, so

there is still some potential.


My first (braindead) multi-threaded UCT played weaker with two cores  
than one core. How do you combine search trees/results? How do you  
pick a move to play?


Single-threaded RAVE with no parameter tuning, an ancient RAVE  
formula, and 10k light playouts per second got me into the 1600's on  
CGOS. I used Don Dailey's AMAF methodology for RAVE (first color to  
play on a point, keep 7/8 of move list).


Actually, that rating was probably using my own home-grown RAVE  
formula. Sorry for the misinformation.




Also, I noticed your rank measurements were based on CGOS results  
after relatively few games. It can retain significant bias for quite  
a while.





--
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569 
a

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Isaac Deutsch

Wow, thanks for all the answers! You're being really helpful.

Do you use UCT with a too large exploration term?
That's a good idea. I actually use a rather big value for c=0.5. I might try
lowering it. Thanks! (Precisely, the formula is c*sqrt(log(p)/c).)

My first (braindead) multi-threaded UCT played weaker with two cores  
than one core. How do you combine search trees/results? How do you  
pick a move to play?

I have the threads write to a global array (of course one after another
that stores the values and the numbers of games played (each thread adds to
these values) of the first ply. I then pick the move with the most games
played through that node. The value of this best move (sum/numplayed)
determines if the bot should resign.

Yes, and you should go by the bayeselo page which is a better picture of
what is going on.

The ELOs posted here refer to these values.

I don't think many people realize that you have to play hundreds of
games just to be within 40 or 50 ELO with much certainty.   If you play
less than 100 games you could easily be off by over 100 ELO.

Maybe I'm a bit (a lot :) impatient, but I try to make a rough guess after
about 150-200 games, which is about 1-2 days of letting the program run on
the server. I think it's possible to say after 150 games that RAVE did not
give me a 300 ELO boost.

Thanks again,
Greetings,
Isaac
-- 
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL 
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Don Dailey

On Mon, 2009-02-02 at 18:09 +0100, Isaac Deutsch wrote:
 I don't think many people realize that you have to play hundreds of
 games just to be within 40 or 50 ELO with much certainty.   If you
 play
 less than 100 games you could easily be off by over 100 ELO.
 
 Maybe I'm a bit (a lot :) impatient, but I try to make a rough guess
 after
 about 150-200 games, which is about 1-2 days of letting the program
 run on
 the server. I think it's possible to say after 150 games that RAVE did
 not
 give me a 300 ELO boost.



You already reported 75 ELO improvement.  Gelly said you should get
200-300.   

So you are only about 125 off of Gelly's lower bound (which sounded like
a rough guess to me.)   So you have a lot of uncertainty here and only
150 games.   

Of course this is plenty enough to suspect a problem and investigate,
but it's not enough to conclude that you surely did something wrong.

How much did you test the version that doesn't have this change?   You
have 3 sources of slop here:

  1. Gelly's rough estimate of how much you should get. 
  2. The rating uncertainty of the unmodified program.
  3. The rating uncertainty of the modified program.
  


- Don

   



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Isaac Deutsch

I have about 200-300k games/move, so maybe the effect is even less. But,
maybe I still have a grave bug somewhere. I will check again.

Cheers, ibd
 
 On Feb 2, 2009, at 9:57 AM, Isaac Deutsch wrote:
 
  They are not pattern based playouts, but as I said uniformly random.
  I reckon the effect of RAVE is less with these?
 
 My MCTS implementation sees a gain of 200-400 ELO from RAVE using  
 uniformly random moves. 400 gain (90% win-rate) for 2K playouts,  
 becoming slowly less for larger numbers of playouts.
 
 Mark

-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Don Dailey

On Mon, 2009-02-02 at 09:40 -0500, Jason House wrote:
 Also, I noticed your rank measurements were based on CGOS results  
 after relatively few games. It can retain significant bias for quite
 a  
 while.

Yes, and you should go by the bayeselo page which is a better picture of
what is going on.

I don't think many people realize that you have to play hundreds of
games just to be within 40 or 50 ELO with much certainty.   If you play
less than 100 games you could easily be off by over 100 ELO.   

- Don


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Jason House


On Feb 2, 2009, at 12:09 PM, Isaac Deutsch i...@gmx.ch wrote:


Wow, thanks for all the answers! You're being really helpful.

Do you use UCT with a too large exploration term?
That's a good idea. I actually use a rather big value for c=0.5. I  
might try

lowering it. Thanks! (Precisely, the formula is c*sqrt(log(p)/c).)


That formula is wrong. Did you mean to have a denominator of c?




My first (braindead) multi-threaded UCT played weaker with two cores
than one core. How do you combine search trees/results? How do you
pick a move to play?

I have the threads write to a global array (of course one after  
another
that stores the values and the numbers of games played (each thread  
adds to
these values) of the first ply. I then pick the move with the most  
games

played through that node. The value of this best move (sum/numplayed)
determines if the bot should resign.


That sounds a lot like what I did (two independent searches that are  
combined when genmove is called). I believe that strategy had a 100+  
ELO loss in strength for me. I'm not sure why that was the case (but  
have some theories). You may want to experiment with one search thread  
to see if it hurts your performance too.



Yes, and you should go by the bayeselo page which is a better  
picture of

what is going on.

The ELOs posted here refer to these values.

I don't think many people realize that you have to play hundreds of
games just to be within 40 or 50 ELO with much certainty.   If you  
play

less than 100 games you could easily be off by over 100 ELO.

Maybe I'm a bit (a lot :) impatient, but I try to make a rough guess  
after
about 150-200 games, which is about 1-2 days of letting the program  
run on
the server. I think it's possible to say after 150 games that RAVE  
did not

give me a 300 ELO boost.


Ofline testing should be faster...





Thanks again,
Greetings,
Isaac
--
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T45 
69a

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Mark Boon



On Feb 2, 2009, at 9:57 AM, Isaac Deutsch wrote:


They are not pattern based playouts, but as I said uniformly random.
I reckon the effect of RAVE is less with these?


My MCTS implementation sees a gain of 200-400 ELO from RAVE using  
uniformly random moves. 400 gain (90% win-rate) for 2K playouts,  
becoming slowly less for larger numbers of playouts.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Isaac Deutsch


 Hi Issac,
 You should be more in the range of +200-300 ELO, at least with pattern
 based
 playouts.
 
 Sylvain

Isaac. They are not pattern based playouts, but as I said uniformly random.
I reckon the effect of RAVE is less with these?

How many playouts per second do you get with each version?

Actually, to make comparable tests with both versions, the version without
RAVE just sets the coefficient of RAVE in the UCT-RAVE value calculation to
zero. I get about 41k games/s on 9x9 (using 2 cores, about 20k/s/core).
The playouts are fairly optimized but the tree search isn't at all yet, so
there is still some potential.
-- 
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL 
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-02 Thread Darren Cook

 My first (braindead) multi-threaded UCT played weaker with two cores
 than one core. How do you combine search trees/results? How do you pick
 a move to play?

There were a couple of papers [2] at CG2008 on this subject. The
consensus seemed to be that root parallelization [1] was best. In fact
Guillaume Chaslot's version got a strength speedup of 3.0 using 2
threads, and 6.5 using 4 threads (dropping to 14.9 with 16 threads).
This is of course impossible, and implies the parallel version is
somehow doing MCTS better than the single thread algorithm!

Darren

[1]: Build multiple MCTS trees in parallel, one per thread. No
communication. At end add the trees together and use grand total to
select move.

[2]:
http://www.cs.unimaas.nl/g.chaslot/papers/parallelMCTS.pdf
and
A Parallel Monte-Carlo Tree Search Algorithm by Tristan Cazenave (I
couldn't seem to find a PDF link.)

-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-01 Thread Isaac Deutsch

By the way, I got about 75 ELO points (1650-1720) with light playouts out of 
RAVE. Do you think this is in the expected range? It's not really similar to 
the 20%-60% win rate rise vs. GnuGo described in some papers...

 At the moment I'm also tuning the bias in the range 0.001-0.1. Given
 uniformly random (light) playouts, is the bias expected to be bigger than
 with
 heavy playouts, meaning the constant has to be bigger aswell?

Cheers, ibd
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-02-01 Thread Jason House


How many playouts per second do you get with each version?

Sent from my iPhone

On Feb 1, 2009, at 4:46 PM, Isaac Deutsch i...@gmx.ch wrote:

By the way, I got about 75 ELO points (1650-1720) with light  
playouts out of RAVE. Do you think this is in the expected range?  
It's not really similar to the 20%-60% win rate rise vs. GnuGo  
described in some papers...



At the moment I'm also tuning the bias in the range 0.001-0.1. Given
uniformly random (light) playouts, is the bias expected to be  
bigger than

with
heavy playouts, meaning the constant has to be bigger aswell?


Cheers, ibd
--
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit al 
len: http://www.gmx.net/de/go/multimessenger01

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-30 Thread Magnus Persson


Quoting Sylvain Gelly sylvain.ge...@m4x.org:


On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch i...@gmx.ch wrote:


And a final question: You calculate the (beta) coefficient as
c = rc / (rc+c+rc*c*BIAS);
which looks similar to the formula proposed by David Silver (If I recall
his name correctly). However, in his formula, the last term looks like
rc*c*BIAS/(q_ur*(1-q_ur))
Is it correct that we could get q_ur from the current UCT-RAVE mean value,
and that it is used like that?



Yes the formula looks very similar (David proposed that formula to me in the
beginning of 2007). However my implementation did not contain
the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5
so the factor=0.25.
I did not try the other formula, maybe it works better in practice, while I
would expect it is similar in practice.


Valkyria uses an even more complicated version of what David Silver  
proposed (I really did not understand it so I came up with something  
that looked plausible to me that actually estimated the bias for each  
candidate move rather than assuming it constant).


When Sylvain proposed this simple version I tested that version  
against my own interpretation. On 9x9 my complicated version might  
have a win rate 3% better than the simple version for 3 data points  
(700 games each) near the maximum. The standard error according to  
twogtp is 1.9.


On 19x19 I got results where there no difference at all but with much  
higher uncertainty because there was not many games played.


But the term is important for sure, the constant BIAS used should be  
larger than 0 but you should be careful to not set it too high. For  
Valkyria the 0.015 value Sylvain posted here worked fine. But if it is  
higher for example 0.15 leads to bad performance on 9x9 and  
catastrophic performance on 19x19. Initially I thought this parameter  
should be something like 1 and that was completely wrong. I was just  
lucky to get it right, just by visual inspection of the search  
behavior when I played around with the parameter.


The reason is that the bias term of the denominator should be close to  
zero initially is to allow AMAF to have strong impact on the search  
but at some point (which is delayed by having a small BIAS constant)  
there is a quick shift towards using the real win rate instead.


-Magnus


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-30 Thread Sylvain Gelly

On Fri, Jan 30, 2009 at 9:29 AM, Magnus Persson magnus.pers...@phmp.sewrote:

 Quoting Sylvain Gelly sylvain.ge...@m4x.org:

  On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch i...@gmx.ch wrote:

  And a final question: You calculate the (beta) coefficient as
 c = rc / (rc+c+rc*c*BIAS);
 which looks similar to the formula proposed by David Silver (If I recall
 his name correctly). However, in his formula, the last term looks like
 rc*c*BIAS/(q_ur*(1-q_ur))
 Is it correct that we could get q_ur from the current UCT-RAVE mean
 value,
 and that it is used like that?



 Yes the formula looks very similar (David proposed that formula to me in
 the
 beginning of 2007). However my implementation did not contain
 the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5
 so the factor=0.25.
 I did not try the other formula, maybe it works better in practice, while
 I
 would expect it is similar in practice.


 Valkyria uses an even more complicated version of what David Silver
 proposed (I really did not understand it so I came up with something that
 looked plausible to me that actually estimated the bias for each candidate
 move rather than assuming it constant).

 When Sylvain proposed this simple version I tested that version against my
 own interpretation. On 9x9 my complicated version might have a win rate 3%
 better than the simple version for 3 data points (700 games each) near the
 maximum. The standard error according to twogtp is 1.9.

 On 19x19 I got results where there no difference at all but with much
 higher uncertainty because there was not many games played.


Did you try to tune the bias constant at all or just took the one I posted?
I wrote it from memory and I believe it is in the range of possibly good
values, but it is certainly not optimal, I can be off by a significant
factor in both directions.


 But the term is important for sure, the constant BIAS used should be larger
 than 0 but you should be careful to not set it too high. For Valkyria the
 0.015 value Sylvain posted here worked fine. But if it is higher for example
 0.15 leads to bad performance on 9x9 and catastrophic performance on 19x19.
 Initially I thought this parameter should be something like 1 and that was
 completely wrong. I was just lucky to get it right, just by visual
 inspection of the search behavior when I played around with the parameter.


The bias can't possibly be 1 because by definition it is the difference
between the expected value of the normal tree search and the expected value
of AMAF. As those two values are in [0,1] anyway, the bias is certainly not
1, and in most cases very small.



 The reason is that the bias term of the denominator should be close to zero
 initially is to allow AMAF to have strong impact on the search but at some
 point (which is delayed by having a small BIAS constant) there is a quick
 shift towards using the real win rate instead.

 -Magnus



 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-30 Thread Magnus Persson


Quoting Sylvain Gelly sylvain.ge...@m4x.org:

On Fri, Jan 30, 2009 at 9:29 AM, Magnus Persson   
magnus.pers...@phmp.sewrote:



Did you try to tune the bias constant at all or just took the one I posted?
I wrote it from memory and I believe it is in the range of possibly good
values, but it is certainly not optimal, I can be off by a significant
factor in both directions.


Yes I always tries to test at least 5 values. Three values centered  
around what I currently believe is the best value and two or more  
values to the extremes to get the big picture and make sure that I am  
at least the maximum.


ConstantWinrate Against Gnugo
0.00015 43.9
0.0015  50
0.015   50
0.1543.1
1   7.1
10  6.7


The bias can't possibly be 1 because by definition it is the difference
between the expected value of the normal tree search and the expected value
of AMAF. As those two values are in [0,1] anyway, the bias is certainly not
1, and in most cases very small.


I think I was confused about exactly what in the term that was the  
bias. I do not really remember what and why I thought these things.  
Anyway your explanation is very clear.


On a side note I think Valkyria for some moves have very large biases  
because the playouts are very heavy. I am thinking of an experiment to  
test this. One way is to simply play uniform playouts from a single  
position N times for all moves and thus get accurate win rates for all  
those moves as well as AMAF values. Then taking the difference in  
values for AMAF and real win rates should reveal the size of bias in  
general and if there are systematic difference for certain patterns or  
situations.


Alternatively one could play N playouts for a single move and see what  
AMAF values one get. Now can see to what extent the biases are stable  
when moves are played on the board. The real values will change for  
all moves and the AMAF as well, but will the bias be constant? Or will  
it be random as a function of the position.


There is no tree search in these two cases.

Selective search is often similar to the second situation, and my  
question is whether this can cause larger systematic biases for some  
moves, that thus is missed. And if one understands the nature of such  
biases maybe they can be neutralized somehow.


It could be that using patterns to start with strong biases in the  
prior AMAF values is doing exactly this.


Has anyone has done anything like this or have any interesting  
opinions I would be thankful.


-Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-30 Thread Isaac Deutsch

At the moment I'm also tuning this constant in the range 0.001-0.1. Given
uniformly random (light) playouts, is the bias expected to be bigger than with
heavy playouts, meaning the constant has to be bigger aswell?
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-29 Thread Sylvain Gelly

On Wed, Jan 28, 2009 at 10:01 PM, Isaac Deutsch i...@gmx.ch wrote:

 Hi again ;)

 I found some time to actually implement this stuff. And, this has raised
 some small questions. In this part of the code:

  for (j = index; j  moves_played_in_tree.size(); j += 2) {
 //stuff
   }
  for (j = 0; j  moves_played_out_tree.size(); ++j) {
 //more stuff

// If it is either not the right color or the intersection is
// already played we ignore that move for that node
if (move  0 || already_played.AlreadyPlayed(move)) continue

already_played.Play(move)
 //stuff
  }

 1. Shouldn't the first loop start at j=index+1? Starting at j=index would
 mean that the RAVE value of the node is updated with the move of the node
 itself, wouldn't it? It makes more sense to me to actually start at the
 first child of the node that is being back-upped. Correct me if I'm wrong.

No, you need to update the RAVE value of the node for the first move (move
taken in the position of the node itself). So it is j=index, and that is
important to make the algorithm work.



 2. Shouldn't the order in the second loop be:
 -if (already played): continue;
 -update already played;
 -if (wrong color): continue;
 Otherwise, moves that are the wrong color don't get counted as already
 played (because they never get updated). I'm not sure if it makes a
 difference in this case because you check in the playouts, too, but maybe
 it does.


I think it is ok like that because indeed the check is already done in the
playout. The already_played.Play(move) is actually also unnecessary in the
second loop (not really rechecked as I speak, but I think so as far as I
remember).


 And a final question: You calculate the (beta) coefficient as
 c = rc / (rc+c+rc*c*BIAS);
 which looks similar to the formula proposed by David Silver (If I recall
 his name correctly). However, in his formula, the last term looks like
 rc*c*BIAS/(q_ur*(1-q_ur))
 Is it correct that we could get q_ur from the current UCT-RAVE mean value,
 and that it is used like that?


Yes the formula looks very similar (David proposed that formula to me in the
beginning of 2007). However my implementation did not contain
the (q_ur*(1-q_ur) factor, that I approximated by a constant, taking q=0.5
so the factor=0.25.
I did not try the other formula, maybe it works better in practice, while I
would expect it is similar in practice.

Sylvain



 Regards,
 Isaac Deutsch
 --
 Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen:
 http://www.gmx.net/de/go/multimessenger
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-28 Thread Isaac Deutsch

Hi again ;)

I found some time to actually implement this stuff. And, this has raised
some small questions. In this part of the code:

  for (j = index; j  moves_played_in_tree.size(); j += 2) {
//stuff
  }
  for (j = 0; j  moves_played_out_tree.size(); ++j) {
//more stuff

// If it is either not the right color or the intersection is
// already played we ignore that move for that node
if (move  0 || already_played.AlreadyPlayed(move)) continue

already_played.Play(move)
//stuff
  }

1. Shouldn't the first loop start at j=index+1? Starting at j=index would
mean that the RAVE value of the node is updated with the move of the node
itself, wouldn't it? It makes more sense to me to actually start at the
first child of the node that is being back-upped. Correct me if I'm wrong.
2. Shouldn't the order in the second loop be:
-if (already played): continue;
-update already played;
-if (wrong color): continue;
Otherwise, moves that are the wrong color don't get counted as already
played (because they never get updated). I'm not sure if it makes a
difference in this case because you check in the playouts, too, but maybe
it does.

And a final question: You calculate the (beta) coefficient as
c = rc / (rc+c+rc*c*BIAS);
which looks similar to the formula proposed by David Silver (If I recall
his name correctly). However, in his formula, the last term looks like
rc*c*BIAS/(q_ur*(1-q_ur))
Is it correct that we could get q_ur from the current UCT-RAVE mean value,
and that it is used like that?

Regards,
Isaac Deutsch
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Thomas Lavergne

On Sat, Jan 17, 2009 at 08:29:32PM +0100, Sylvain Gelly wrote:
 ChooseMove(node, board) {
   bias = 0.015  // I put a random number here, to be tuned
   b = bias * bias / 0.25
   best_value = -1
   best_move = PASSMOVE
   for (move in board.allmoves) {
 c = node.child(move).counts
 w = node.child(move).wins
 rc = node.rave_counts[move]
 rw = node.rave_wins[move]
 coefficient = 1 - rc / (rc + c + rc * c * b)
 value = w / c * coef + rw / rc * (1 - coef)  // please here take care of
 the c==0 and rc == 0 cases
 if (value  best_value) {
   best_value = value
   best_move = move
 }
   }
   return best_move
 }

Hi,

it seems to me that, when you select play in the tree, you don't have an
exploration component. You use just a weighted average of score and RAVE
score.
So, if :
  - the best play is a good only if played immediatly and very bad if
played later in the game :
  - the first playout for this play resulted in a lost.
score and RAVE score will be very low and this play will never be
considered again until a very long time.

Is it simplified code and in reality you replace w/c and rw/rc by scores
with exploration component or did you realy use it as is ?

Tom

-- 
Thomas LavergneEntia non sunt multiplicanda praeter
 necessitatem. (Guillaume d'Ockham)
thomas.laver...@reveurs.orghttp://oniros.org
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Magnus Persson


Quoting Thomas Lavergne thomas.laver...@reveurs.org:


  - the best play is a good only if played immediatly and very bad if
played later in the game :
  - the first playout for this play resulted in a lost.
score and RAVE score will be very low and this play will never be
considered again until a very long time.



You raise an interesting concern.

The simple solution to your question is to add an exploration term  
using UCT for example. Then it becomes an empirical question what  
parameter for exploration gives the strongest play. My experience is  
that the best parameter is so small it can be set to zero.


I think the conditions you defined are very rarely completely  
fulfilled. What can be true often however is that a single bad move  
makes the best move very bad if played later in the game. If the bad  
move happen to be the second best move, it will be searched a lot  
lowering the AMAF score (rw/rc) for the best move.


This is likely to happen when there are several local moves that more  
or less solves the same problem. That is when one move is played the  
effect of the other move played later will overlap with the first.


-Magnus


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Mark Boon



On Jan 21, 2009, at 10:23 AM, Magnus Persson wrote:


Quoting Thomas Lavergne thomas.laver...@reveurs.org:


 - the best play is a good only if played immediatly and very bad if
   played later in the game :
 - the first playout for this play resulted in a lost.
score and RAVE score will be very low and this play will never be
considered again until a very long time.



You raise an interesting concern.

The simple solution to your question is to add an exploration term  
using UCT for example. Then it becomes an empirical question what  
parameter for exploration gives the strongest play. My experience is  
that the best parameter is so small it can be set to zero.


Well, empirically, when I set the exploration component to zero it  
starts to play a lot worse. Like I wrote: the winning percentage drops  
to 24% vs. the same program with the exploration component, which is a  
huge difference.


So if you have a different experience, you must have something else  
that overcomes this hurdle that's not part of a simple MCTS-RAVE  
implementation. I'd be very interested to learn what that is. Sylvain  
didn't take the bait ;-)


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Olivier Teytaud


 Well, empirically, when I set the exploration component to zero it starts
 to play a lot worse. Like I wrote: the winning percentage drops to 24% vs.
 the same program with the exploration component, which is a huge difference.

 So if you have a different experience, you must have something else that
 overcomes this hurdle that's not part of a simple MCTS-RAVE implementation.
 I'd be very interested to learn what that is. Sylvain didn't take the bait
 ;-)


Here, we have a non-zero initialization of the number of wins, of the
numbere of simulations, of the number of Rave-wins, of the number of
Rave-losses.
We have then a 0 constant for exploration, but also an exploratory term
which is very different, and for which I am not the main author - therefore
I let the main author
give an explanation if he wants to :-)

I point out that even before this exploratory term, the best UCB-like
exploration-constant was 0 - as soon as the initializations of numbers of
wins, of losses, of Rave-wins, of Rave-losses are heuristic values.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Mark Boon



On Jan 21, 2009, at 11:53 AM, Olivier Teytaud wrote:

Here, we have a non-zero initialization of the number of wins, of  
the numbere of simulations, of the number of Rave-wins, of the  
number of Rave-losses.
We have then a 0 constant for exploration, but also an exploratory  
term which is very different, and for which I am not the main author  
- therefore I let the main author

give an explanation if he wants to :-)

I point out that even before this exploratory term, the best UCB- 
like exploration-constant was 0 - as soon as the initializations of  
numbers of wins, of losses, of Rave-wins, of Rave-losses are  
heuristic values.


I'd like to make sure I understand what you mean exactly. You use some  
heuristics to intialize all the moves (or maybe some of the moves)  
with a certain win-loss and rave-win-loss ratios?


To a certain extent I suppose these could come from the reading of the  
previous move? I think I slowly start to make sense of things...


Mark



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Olivier Teytaud


 I'd like to make sure I understand what you mean exactly. You use some
 heuristics to intialize all the moves (or maybe some of the moves) with a
 certain win-loss and rave-win-loss ratios?


Not only ratios, but also numbers of simulations. Thanks to patterns, expert
rules.




 To a certain extent I suppose these could come from the reading of the
 previous move? I think I slowly start to make sense of things...


We have (almost) no warm start from the values of previous moves. This
should be improved, as very clearly there is some time lost here (e.g.
patterns
should not be checked from scratch).

A+
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Sylvain Gelly

Hi,

I did not mention here the prior initialization that is done in each node.
When you create a node, you can look at all possible move and if a pattern
matches (the exact same as in the playout) you initialize rw and rc to 14.
If the move saves a capture (same as in the playout), same initialization,
rw and rc to 14. If it is a self atari, you initialize rw to 0 and rc to 14.
Else you initialize rw to 7 and rc to 14.
Of course you can do put much more clever prior if you are a player and know
the subtleties of the game.

You can put an exploration term, but the cases where it is needed are rare.
I did a lot of experiments on that, and even at long thinking time, no
exploration term was always better (statistically).

Sylvain

2009/1/21 Mark Boon tesujisoftw...@gmail.com


 On Jan 21, 2009, at 10:23 AM, Magnus Persson wrote:

  Quoting Thomas Lavergne thomas.laver...@reveurs.org:

   - the best play is a good only if played immediatly and very bad if
   played later in the game :
  - the first playout for this play resulted in a lost.
 score and RAVE score will be very low and this play will never be
 considered again until a very long time.



 You raise an interesting concern.

 The simple solution to your question is to add an exploration term using
 UCT for example. Then it becomes an empirical question what parameter for
 exploration gives the strongest play. My experience is that the best
 parameter is so small it can be set to zero.


 Well, empirically, when I set the exploration component to zero it starts
 to play a lot worse. Like I wrote: the winning percentage drops to 24% vs.
 the same program with the exploration component, which is a huge difference.

 So if you have a different experience, you must have something else that
 overcomes this hurdle that's not part of a simple MCTS-RAVE implementation.
 I'd be very interested to learn what that is. Sylvain didn't take the bait
 ;-)

 Mark


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Olivier Teytaud

 Of course you can do put much more clever prior if you are a player and
 know the subtleties of the game.


E.g. patterns extracted from databases - but it's not enough, carefully tune
the coefficients for empty triangles (important!) and various other
importants patterns/rules (don't just keep the empirical probabilities from
datasets as coefficients). I'm afraid the coefficients are very
implementation-dependent unless the bandit and the random player are
specified with
a lot of details.

You can put an exploration term, but the cases where it is needed are rare.
 I did a lot of experiments on that, and even at long thinking time, no
 exploration term was always better (statistically).


Well, now mogo has an exploration term - but not at all UCB-like. As said
previously, I'm not the main author of this modification and I'll not
explain it myself.
Also, for the main node (the root), the bandit in mogo is different from
the bandit elsewhere in the tree. More diversification is useful. But it's
not a very important improvement, a few percents.

Best regards,
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Sylvain Gelly

2009/1/21 Olivier Teytaud olivier.teyt...@lri.fr


 Of course you can do put much more clever prior if you are a player and
 know the subtleties of the game.


 E.g. patterns extracted from databases - but it's not enough, carefully
 tune the coefficients for empty triangles (important!) and various other
 importants patterns/rules (don't just keep the empirical probabilities from
 datasets as coefficients). I'm afraid the coefficients are very
 implementation-dependent unless the bandit and the random player are
 specified with
 a lot of details.

 You can put an exploration term, but the cases where it is needed are rare.
 I did a lot of experiments on that, and even at long thinking time, no
 exploration term was always better (statistically).


 Well, now mogo has an exploration term - but not at all UCB-like.


I was talking about times where I was still there ... ages ago :)

Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-21 Thread Olivier Teytaud

 Well, now mogo has an exploration term - but not at all UCB-like.


 I was talking about times where I was still there ... ages ago :)


Good old times :-)
you've been helpful several times even from far away :-)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] How to properly implement RAVE?

2009-01-18 Thread David Fotland

I think it is too expensive to read ladders during playouts.  I remember
that you have faster ladders search code so it might not cost you as much.
My playout code has no ability to undo a move or do any kind of lookahead.

David

 
 Some examples: David Fotland wrote he does light playouts with just a
 few patterns but no tactics. I find that using a moderate amount of
 tactics actually is the biggest contributor to playing strength (save
 one or more stones if can't be caught in ladder). However, augmenting
 patterns with tactical information I found doesn't help at all, even
 when disregarding the performance cost. Maybe David uses some patterns
 to compensate for part of the tactics and relies on the faster
 playouts to compensate for poorer playouts. I'm guessing here, but
 otherwise I can't imagine why he would forego what otherwise seems to
 be a big gain in strength.
 
 Mark
 
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-18 Thread Mark Boon



On Jan 18, 2009, at 4:11 PM, David Fotland wrote:

I think it is too expensive to read ladders during playouts.  I  
remember
that you have faster ladders search code so it might not cost you as  
much.
My playout code has no ability to undo a move or do any kind of  
lookahead.


Yes, my ladder code is fast. But it has been public for many years, I  
had figured others would have caught up on that.


My playout code also has no ability to do undo. If I remember well,  
adding a few simple tactical searches during playouts made the  
performance go from 20Kps to 15Kps. But it moved to a winning-rate of  
about 90%. Reading tactics has to be really slow for that not to be  
worth it. I did find a nice heuristic to cut in less than half the  
amount of tactical reading necessary without any noticeable loss in  
playing strength. I'll write it down one day, I think it may have  
applicability beyond tactical reading and be a general concept to be  
used during playouts.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-18 Thread Mark Boon



On Jan 18, 2009, at 5:38 PM, Magnus Persson wrote:

In Valkyria I solved this by playing out the ladder on the playout  
board, and store all changes on a stack. When the ladder undos moves  
it just pop changes from the stack. In this way I can also use the  
rich board representation of Valkyria to setup the ladder fast. The  
drawback is that the code is ugly and slightly buggy when stones are  
sacrificed during the search.


A todo thing is to write a more correct ladder reader that still  
uses the idea of sharing the array with board with the playout, so  
that no copying has to be done.




My ladder code re-uses the board array of the playout module. But  
copying this array before reading a ladder doesn't slow things down  
all that much. As has been discussed on this list elsewhere, copying  
an array is fast nowadays. Re-using the array is actually a way to  
make sure your ladder-reading is without bugs when doing undo. If  
something goes wrong you notice immediately...


Maybe what David alluded to is that the playout has no undo so he  
can't use it to play and undo moves during ladder search. In my case  
the ladder reader only needs an array with the position. For the rest  
it's completely self-contained. That adds maybe a little bit when  
setting up the ladder reading, but it's still fast.


Mark



-Magnus

Quoting David Fotland fotl...@smart-games.com:

I think it is too expensive to read ladders during playouts.  I  
remember
that you have faster ladders search code so it might not cost you  
as much.
My playout code has no ability to undo a move or do any kind of  
lookahead.


David



--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly

Hi,

Sorry for the slow reply.
After those discussions, I figured out that pseudo code was the
fastest clear and not ambiguous way to describe how to precisely
implement the algorithm. I needed to find some time to write it.
Note that I did not write only the backup phase because to clearly
describe the backup you have to see the playouts as well. I also
describe the choice of the best move.
Note also the fact that the backup from the moves in the tree and from
the moves out of the tree is different. That is the somehow more
tricky part to check that a move has not been already played (that is
different for each node in the tree and we obviously don't want to
keep a vector of already played moves for each node).

Please forgive the mistakes that are in that code, of course I did not
make any test, and it has been quite a long time I am not in the
computer go trip anymore :). Please correct any mistake,
I hope it makes things clearer.

Best,
Sylvain

class AmafBoard {
  AmafBoard() {
offset = 0
fill(alread_played, 0)
  }
  Clear() {
offset++;
  }
  Play(move) {
already_played[move] = offset
  }
  AlreadyPlayed(move) {
return already_played[move] == offset
  }
  vector already_played
  int offset
}

ChooseMove(node, board) {
  bias = 0.015  // I put a random number here, to be tuned
  b = bias * bias / 0.25
  best_value = -1
  best_move = PASSMOVE
  for (move in board.allmoves) {
c = node.child(move).counts
w = node.child(move).wins
rc = node.rave_counts[move]
rw = node.rave_wins[move]
coefficient = 1 - rc * (rc + c + rc * c * b)
value = w / c * coef + rw / rc * (1 - coef)  // please here take
care of the c==0 and rc == 0 cases
if (value  best_value) {
  best_value = value
  best_move = move
}
  }
  return best_move
}
PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) {
  node = tree.root
  while (node) {
move = ChooseMove(node, board)
moves_played_in_tree.append(move)
nodes_seen_in_tree.append(node)
board.PlayMove(move)
node = node.child(move)
  }
}
PlayoutOutTree(board, AmafBoard played, moves) {
  int pass = 0
  while (pass  2) {
move = board.ChooseMoveAndPlay()
if (move == PASSMOVE) {
  pass ++
  continue
} else {
  pass = 0
}
if (!played.AlreadyPlayed(move)) {
  if (!board.last_move_was_black()) {
move = -move
  }
  moves.append(move)
}
  }
  return board.black_wins()
}

BackupNode(node, index, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played) {
  already_played.Clear()
  win = node.is_black_to_play ? black_wins : !black_wins
  // normal backup
  node.wins += win
  node.counts ++
  // rave backup
  for (j = index; j  moves_played_in_tree.size(); j += 2) {
move = moves_played_in_tree[j]
if (already_played.AlreadyPlayed(move)) continue
already_played.Play(move)
node.rave_wins[move] += win
node.rave_counts[move] ++
  }
  for (j = 0; j  moves_played_out_tree.size(); ++j) {
move = moves_played_out_tree[j]
if (!node.is_black_to_play) move = -move
// If it is either not the right color or the intersection is
already played we ignore that move for that node
if (move  0 || already_played.AlreadyPlayed(move)) continue

already_played.Play(move)
node.rave_wins[move] += win
node.rave_counts[move] ++
  }
}

Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
moves_played_out_tree, already_played) {
  for (i = 0; i  nodes_seen_in_tree.size(); ++i) {
node = nodes_seen_in_tree[i]
BackupNode(node, i, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played)
  }
}
OneIteration(board_position, AmafBoard already_played) {
  board = board_position.Copy()
  already_played.Clear()

  vector moves_played_in_tree
  vector nodes_seen_in_tree
  PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree)

  vector moves_played_out_tree
  bool black_wins = PlayoutOutTree(board, already_played,
moves_played_out_tree)
  Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
moves_played_out_tree, already_played)
}
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Isaac Deutsch

 Hi,
 
 Sorry for the slow reply.

Hi

I'd prefer quality over speed anytime. ;)
Your pseudo code is excellent and helped me to understand some of the trickier 
things. Thanks again!
I think I will now be able to implement my own version. :)

Regards,
Isaac
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Mark Boon

Thanks for taking the time Sylvain. I took a quick look to see how  
your pseudo-code compared to the reference implementation I made.  
There are a few small differences, and I think the place(s) where I  
deviated is exactly what confused Daniel Waeber.


The first difference is that when I check whether a point has been  
played, I always start from the position of the root-node, whereas you  
start from the position of the node where the moves_played_in_tree is  
played. The second difference is I also don't count a move if the  
opposite color played on that point first. The result is I only need  
to compute the alreadyPlayed map once (in my code these are called  
weightMap and colorMap, I need two maps because I use decreasing  
weights) instead of computing it at each step back up in the tree.


The only time these 'deviations' make a difference is in case of ko  
and ishi-no-shita. When I have a little spare time I'll check to see  
if this actually makes a difference in playing strength. If there's  
any noticeable difference I'll try to modify the code in my reference  
implementation to reflect the 'correct' method.


Mark


On Jan 17, 2009, at 11:48 AM, Sylvain Gelly wrote:


Hi,

Sorry for the slow reply.
After those discussions, I figured out that pseudo code was the
fastest clear and not ambiguous way to describe how to precisely
implement the algorithm. I needed to find some time to write it.
Note that I did not write only the backup phase because to clearly
describe the backup you have to see the playouts as well. I also
describe the choice of the best move.
Note also the fact that the backup from the moves in the tree and from
the moves out of the tree is different. That is the somehow more
tricky part to check that a move has not been already played (that is
different for each node in the tree and we obviously don't want to
keep a vector of already played moves for each node).

Please forgive the mistakes that are in that code, of course I did not
make any test, and it has been quite a long time I am not in the
computer go trip anymore :). Please correct any mistake,
I hope it makes things clearer.

Best,
Sylvain

class AmafBoard {
 AmafBoard() {
   offset = 0
   fill(alread_played, 0)
 }
 Clear() {
   offset++;
 }
 Play(move) {
   already_played[move] = offset
 }
 AlreadyPlayed(move) {
   return already_played[move] == offset
 }
 vector already_played
 int offset
}

ChooseMove(node, board) {
 bias = 0.015  // I put a random number here, to be tuned
 b = bias * bias / 0.25
 best_value = -1
 best_move = PASSMOVE
 for (move in board.allmoves) {
   c = node.child(move).counts
   w = node.child(move).wins
   rc = node.rave_counts[move]
   rw = node.rave_wins[move]
   coefficient = 1 - rc * (rc + c + rc * c * b)
   value = w / c * coef + rw / rc * (1 - coef)  // please here take
care of the c==0 and rc == 0 cases
   if (value  best_value) {
 best_value = value
 best_move = move
   }
 }
 return best_move
}
PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) {
 node = tree.root
 while (node) {
   move = ChooseMove(node, board)
   moves_played_in_tree.append(move)
   nodes_seen_in_tree.append(node)
   board.PlayMove(move)
   node = node.child(move)
 }
}
PlayoutOutTree(board, AmafBoard played, moves) {
 int pass = 0
 while (pass  2) {
   move = board.ChooseMoveAndPlay()
   if (move == PASSMOVE) {
 pass ++
 continue
   } else {
 pass = 0
   }
   if (!played.AlreadyPlayed(move)) {
 if (!board.last_move_was_black()) {
   move = -move
 }
 moves.append(move)
   }
 }
 return board.black_wins()
}

BackupNode(node, index, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played) {
 already_played.Clear()
 win = node.is_black_to_play ? black_wins : !black_wins
 // normal backup
 node.wins += win
 node.counts ++
 // rave backup
 for (j = index; j  moves_played_in_tree.size(); j += 2) {
   move = moves_played_in_tree[j]
   if (already_played.AlreadyPlayed(move)) continue
   already_played.Play(move)
   node.rave_wins[move] += win
   node.rave_counts[move] ++
 }
 for (j = 0; j  moves_played_out_tree.size(); ++j) {
   move = moves_played_out_tree[j]
   if (!node.is_black_to_play) move = -move
   // If it is either not the right color or the intersection is
already played we ignore that move for that node
   if (move  0 || already_played.AlreadyPlayed(move)) continue

   already_played.Play(move)
   node.rave_wins[move] += win
   node.rave_counts[move] ++
 }
}

Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
moves_played_out_tree, already_played) {
 for (i = 0; i  nodes_seen_in_tree.size(); ++i) {
   node = nodes_seen_in_tree[i]
   BackupNode(node, i, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played)
 }
}
OneIteration(board_position, AmafBoard already_played) {
 board = board_position.Copy()
 already_played.Clear()

 vector moves_played_in_tree
 vector nodes_seen_in_tree
 PlayoutInTree(board,

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly

A small point: in PlayoutOutTree, just after if
(!played.AlreadyPlayed(move)) {, there should have a played.Play(move).
I believe it does not change the final result (as the check is also done in
the backup, and the move played in the backup), but I simply forgot that
line (that should make moves_played_out_tree smaller).

To avoid confusion, I repost the pseudo code with that correction (and
hoping the indentation is not broken by the email editor once again).

Sylvain


class AmafBoard {
  AmafBoard() {
offset = 0
fill(alread_played, 0)
  }
  Clear() {
offset++;
  }  Play(move) {
 already_played[move] = offset
  }
  AlreadyPlayed(move) {
 return already_played[move] == offset
  }
  vector already_played
  int offset
}

ChooseMove(node, board) {
  bias = 0.015  // I put a random number here, to be tuned
  b = bias * bias / 0.25
  best_value = -1
  best_move = PASSMOVE
  for (move in board.allmoves) {
c = node.child(move).counts
w = node.child(move).wins
rc = node.rave_counts[move]
rw = node.rave_wins[move]
coefficient = 1 - rc * (rc + c + rc * c * b)
value = w / c * coef + rw / rc * (1 - coef)  // please here take care of
the c==0 and rc == 0 cases
if (value  best_value) {
  best_value = value
  best_move = move
}
  }
  return best_move
}


PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) {
  node = tree.root
  while (node) {
move = ChooseMove(node, board)
moves_played_in_tree.append(move)
nodes_seen_in_tree.append(node)
board.PlayMove(move)
node = node.child(move)
  }
}


PlayoutOutTree(board, AmafBoard played, moves) {
  int pass = 0
  while (pass  2) {
move = board.ChooseMoveAndPlay()
if (move == PASSMOVE) {
  pass ++
  continue
} else {
  pass = 0
}
if (!played.AlreadyPlayed(move)) {
  played.Play(move)
  if (!board.last_move_was_black()) {
move = -move
  }
  moves.append(move)
}
  }
  return board.black_wins()
}


BackupNode(node, index, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played) {
  already_played.Clear()
  win = node.is_black_to_play ? black_wins : !black_wins
  // normal backup
  node.wins += win
  node.counts ++
  // rave backup
  for (j = index; j  moves_played_in_tree.size(); j += 2) {
move = moves_played_in_tree[j]
if (already_played.AlreadyPlayed(move)) continue
already_played.Play(move)
node.rave_wins[move] += win
node.rave_counts[move] ++
  }
  for (j = 0; j  moves_played_out_tree.size(); ++j) {
move = moves_played_out_tree[j]
if (!node.is_black_to_play) move = -move

// If it is either not the right color or the intersection is
// already played we ignore that move for that node
if (move  0 || already_played.AlreadyPlayed(move)) continue

already_played.Play(move)
node.rave_wins[move] += win
node.rave_counts[move] ++
  }
}


Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
moves_played_out_tree, already_played) {
  for (i = 0; i  nodes_seen_in_tree.size(); ++i) {
node = nodes_seen_in_tree[i]
BackupNode(node, i, black_wins, moves_played_in_tree,
moves_played_out_tree, already_played)
  }
}


OneIteration(board_position, AmafBoard already_played) {
  board = board_position.Copy()
  already_played.Clear()

  vector moves_played_in_tree
  vector nodes_seen_in_tree
  PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree)

  vector moves_played_out_tree
  bool black_wins = PlayoutOutTree(board, already_played,
 moves_played_out_tree)
  Backup(black_wins, nodes_seen_in_tree, moves_played_in_tree,
  moves_played_out_tree, already_played)
}

2009/1/17 Sylvain Gelly sylvain.ge...@m4x.org:
 Hi,

 Sorry for the slow reply.
 After those discussions, I figured out that pseudo code was the
 fastest clear and not ambiguous way to describe how to precisely
 implement the algorithm. I needed to find some time to write it.
 Note that I did not write only the backup phase because to clearly
 describe the backup you have to see the playouts as well. I also
 describe the choice of the best move.
 Note also the fact that the backup from the moves in the tree and from
 the moves out of the tree is different. That is the somehow more
 tricky part to check that a move has not been already played (that is
 different for each node in the tree and we obviously don't want to
 keep a vector of already played moves for each node).

 Please forgive the mistakes that are in that code, of course I did not
 make any test, and it has been quite a long time I am not in the
 computer go trip anymore :). Please correct any mistake,
 I hope it makes things clearer.

 Best,
 Sylvain

 class AmafBoard {
  AmafBoard() {
offset = 0
fill(alread_played, 0)
  }
  Clear() {
offset++;
  }
  Play(move) {
already_played[move] = offset
  }

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly

Hi Mark,
For the first difference you mention, as far as I remember it makes a small
but significant difference and is one of the main reason I was talking about
tricky details.
For the second difference, I also don't want to count a move if the
opposite color played on that point first, and, unless I am mistaken, I
indeed don't count them in the part of the playout which is outside the
tree.
However you are right for the part inside the tree, due to the j+=2. I am
a little confused and now believe it should not be a j+=2 but a j++,
updating the already_played map for each move inside the tree during the
backup, but doing the backup only once every too moves (to match the color).

It may be a bug in what I proposed, I cannot test it anymore. However, what
I am sure about is that this j+=2 is what I was using and gave very good
results. If what you point out is indeed a bug and makes a difference, then
it can only be even better :). I prefer not modifying the pseudo code I
posted until someone verifies it becomes better.

Thanks,
Sylvain

2009/1/17 Mark Boon tesujisoftw...@gmail.com

 Thanks for taking the time Sylvain. I took a quick look to see how your
 pseudo-code compared to the reference implementation I made. There are a few
 small differences, and I think the place(s) where I deviated is exactly what
 confused Daniel Waeber.

 The first difference is that when I check whether a point has been played,
 I always start from the position of the root-node, whereas you start from
 the position of the node where the moves_played_in_tree is played. The
 second difference is I also don't count a move if the opposite color played
 on that point first. The result is I only need to compute the alreadyPlayed
 map once (in my code these are called weightMap and colorMap, I need two
 maps because I use decreasing weights) instead of computing it at each step
 back up in the tree.

 The only time these 'deviations' make a difference is in case of ko and
 ishi-no-shita. When I have a little spare time I'll check to see if this
 actually makes a difference in playing strength. If there's any noticeable
 difference I'll try to modify the code in my reference implementation to
 reflect the 'correct' method.

 Mark



 On Jan 17, 2009, at 11:48 AM, Sylvain Gelly wrote:

  Hi,

 Sorry for the slow reply.
 After those discussions, I figured out that pseudo code was the
 fastest clear and not ambiguous way to describe how to precisely
 implement the algorithm. I needed to find some time to write it.
 Note that I did not write only the backup phase because to clearly
 describe the backup you have to see the playouts as well. I also
 describe the choice of the best move.
 Note also the fact that the backup from the moves in the tree and from
 the moves out of the tree is different. That is the somehow more
 tricky part to check that a move has not been already played (that is
 different for each node in the tree and we obviously don't want to
 keep a vector of already played moves for each node).

 Please forgive the mistakes that are in that code, of course I did not
 make any test, and it has been quite a long time I am not in the
 computer go trip anymore :). Please correct any mistake,
 I hope it makes things clearer.

 Best,
 Sylvain

 class AmafBoard {
  AmafBoard() {
   offset = 0
   fill(alread_played, 0)
  }
  Clear() {
   offset++;
  }
  Play(move) {
   already_played[move] = offset
  }
  AlreadyPlayed(move) {
   return already_played[move] == offset
  }
  vector already_played
  int offset
 }

 ChooseMove(node, board) {
  bias = 0.015  // I put a random number here, to be tuned
  b = bias * bias / 0.25
  best_value = -1
  best_move = PASSMOVE
  for (move in board.allmoves) {
   c = node.child(move).counts
   w = node.child(move).wins
   rc = node.rave_counts[move]
   rw = node.rave_wins[move]
   coefficient = 1 - rc * (rc + c + rc * c * b)
   value = w / c * coef + rw / rc * (1 - coef)  // please here take
 care of the c==0 and rc == 0 cases
   if (value  best_value) {
 best_value = value
 best_move = move
   }
  }
  return best_move
 }
 PlayoutInTree(board, moves_played_in_tree, nodes_seen_in_tree) {
  node = tree.root
  while (node) {
   move = ChooseMove(node, board)
   moves_played_in_tree.append(move)
   nodes_seen_in_tree.append(node)
   board.PlayMove(move)
   node = node.child(move)
  }
 }
 PlayoutOutTree(board, AmafBoard played, moves) {
  int pass = 0
  while (pass  2) {
   move = board.ChooseMoveAndPlay()
   if (move == PASSMOVE) {
 pass ++
 continue
   } else {
 pass = 0
   }
   if (!played.AlreadyPlayed(move)) {
 if (!board.last_move_was_black()) {
   move = -move
 }
 moves.append(move)
   }
  }
  return board.black_wins()
 }

 BackupNode(node, index, black_wins, moves_played_in_tree,
 moves_played_out_tree, already_played) {
  already_played.Clear()
  win = node.is_black_to_play ? black_wins : !black_wins
  // normal backup
  node.wins += win
  node.counts ++
  //

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly

Hi Issac,
You are welcome, and I am happy there is finally a clearer of implementing
RAVE out there. I believe I should have done it much earlier, sorry for
that, but better late than never, no? :)

Best,
Sylvain

2009/1/17 Isaac Deutsch i...@gmx.ch

  Hi,
 
  Sorry for the slow reply.

 Hi

 I'd prefer quality over speed anytime. ;)
 Your pseudo code is excellent and helped me to understand some of the
 trickier things. Thanks again!
 I think I will now be able to implement my own version. :)

 Regards,
 Isaac
 --
 Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen:
 http://www.gmx.net/de/go/multimessenger
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Magnus Persson


I think I found a bug in ChooseMove

Quoting Sylvain Gelly sylvain.ge...@m4x.org:


coefficient = 1 - rc * (rc + c + rc * c * b)


I think this has to be

coefficient = 1 - rc / (rc + c + rc * c * b)

thus when c is 0 (initially) the coefficient is 0
when c goes towards infinity the coefficent goes 1

which is how this function should behave.

Magnus

--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Sylvain Gelly

Good catch :)Indeed it makes no sense with the *, sorry...

Sylvain

2009/1/17 Magnus Persson magnus.pers...@phmp.se

 I think I found a bug in ChooseMove

 Quoting Sylvain Gelly sylvain.ge...@m4x.org:

 coefficient = 1 - rc * (rc + c + rc * c * b)


 I think this has to be

 coefficient = 1 - rc / (rc + c + rc * c * b)

 thus when c is 0 (initially) the coefficient is 0
 when c goes towards infinity the coefficent goes 1

 which is how this function should behave.

 Magnus

 --
 Magnus Persson
 Berlin, Germany

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-17 Thread Mark Boon



On Jan 17, 2009, at 5:41 PM, Sylvain Gelly wrote:

For the first difference you mention, as far as I remember it makes  
a small but significant difference and is one of the main reason I  
was talking about tricky details.


OK, I ran a test and after 1,000 games with 2K semi-light playouts I  
get a winning percentage of 50.6% for your methods vs. mine. Of course  
it's possible I made some mistake, but my first impression is it makes  
no difference which way you do this particular detail.


Your ChooseNode is also quite different from mine, mostly because I  
also still have a UCT component in there. I'll give your method a go  
one day, just to see if it changes anything.


I've come to understand what you mean by tricky details, sometimes I  
see a big difference in playing strength that I find hard to explain  
given the change(s) I made. Conversely I've been in quite a few cases  
where I thought something would make a difference, only to find out it  
all didn't matter one bit.


It's also possible that some deficiencies that would be apparent in  
one implementation, get compensated for in another.


Some examples: David Fotland wrote he does light playouts with just a  
few patterns but no tactics. I find that using a moderate amount of  
tactics actually is the biggest contributor to playing strength (save  
one or more stones if can't be caught in ladder). However, augmenting  
patterns with tactical information I found doesn't help at all, even  
when disregarding the performance cost. Maybe David uses some patterns  
to compensate for part of the tactics and relies on the faster  
playouts to compensate for poorer playouts. I'm guessing here, but  
otherwise I can't imagine why he would forego what otherwise seems to  
be a big gain in strength.


I also tried to use ownership maps to modify the RAVE value. Remi  
Coulom wrote in a paper he used ownership information of up to 63  
playouts. When I tried something similar it always makes play weaker.  
Maybe I should use it in a different way, but I haven't stumbled on  
the solution yet. When I think of it, AMAF information is already  
something very similar to ownership information. So maybe combining  
the two doesn't make much sense.


Lastly, in an earlier UCT bot that I made I gained a lot by initially  
reducing the number of moves and slowly expanding it. After using AMAF  
it turns out this method hardly gains anything at all anymore.


So the devil is not only in the details, it's also in the combination  
of the details.


Mark


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-15 Thread Daniel Waeber

Thanks you. I think that I understand it now :)

On 23:21 Wed 14 Jan , Mark Boon wrote:
 You have to understand that the 'start' variable really starts at the  
 root from the position for which we do the search. So all the moves  
 'below' the playoutNode are also taken into account. The check in the  
 earlier part if (_colorMap[moveXY]==0) makes sure there's no move  
 between the playoutNode and the n.move as you call it.

Ah, yes, I did not get the meaning of start right. But still I think you
have to incrementally add values to the maps as you go down the tree.

I think there's a problem with caputres/ko-fights inside the tree.
All nodes after the capture should get the amaf color/value for the
stones put there after the capture and not the value of the stones that
were put there and captured.

Most likely I missed a little piece of code again, but hey, perhaps its
a real bug.

 That is, if I understand you correctly. Because I'm not quite sure  
 what you mean by 'finding all these nodes n'. This is not a sub-set of  
 RAVE as I understand it. What you see in the code is the accumulation  
 of the AMAF values after each playout. The RAVE value is calculated  
 using these values when comparing move-candidates, which is in an  
 altogether different place. (The MonteCarloTreeSearchResult  class in  
 my project).
 
 I'm afraid I haven't made things much clearer for you.

The problem is that I was confused, but now, as I understand it, all
your and all the other comments suddenly make sense ;)

 Mark

Thank again,
  wabu

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-15 Thread Mark Boon



On Jan 15, 2009, at 10:47 AM, Daniel Waeber wrote:


Thanks you. I think that I understand it now :)

On 23:21 Wed 14 Jan , Mark Boon wrote:

You have to understand that the 'start' variable really starts at the
root from the position for which we do the search. So all the moves
'below' the playoutNode are also taken into account. The check in the
earlier part if (_colorMap[moveXY]==0) makes sure there's no move
between the playoutNode and the n.move as you call it.


Ah, yes, I did not get the meaning of start right. But still I think  
you

have to incrementally add values to the maps as you go down the tree.


But it does that. This happens when playoutNode is set to its parent  
and the AMAF values are added again (now for the other color) until  
the root is reached.



I think there's a problem with caputres/ko-fights inside the tree.
All nodes after the capture should get the amaf color/value for the
stones put there after the capture and not the value of the stones  
that

were put there and captured.

Most likely I missed a little piece of code again, but hey, perhaps  
its

a real bug.


I did stop to think about ko and 'ishi no shita' (something like  
'playing under the stones') a little bit but couldn't come to an  
immediate conclusion what would be the best thing to do. So I decided  
to leave it as it is. You didn't miss any little piece of code there.  
Maybe there's room for improvement in case of ko, but my guess is the  
difference will be minimal at best. If you find it does make a big  
difference everyone here will be delighted to hear it.


Given how it's performing I doubt there are major problems with my  
AMAF-RAVE implementation.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-15 Thread Daniel Waeber

Hi,

On 11:24 Thu 15 Jan , Mark Boon wrote:
 
 On Jan 15, 2009, at 10:47 AM, Daniel Waeber wrote:
 
  Thanks you. I think that I understand it now :)
 
  On 23:21 Wed 14 Jan , Mark Boon wrote:
  You have to understand that the 'start' variable really starts at the
  root from the position for which we do the search. So all the moves
  'below' the playoutNode are also taken into account. The check in the
  earlier part if (_colorMap[moveXY]==0) makes sure there's no move
  between the playoutNode and the n.move as you call it.
 
  Ah, yes, I did not get the meaning of start right. But still I think  
  you
  have to incrementally add values to the maps as you go down the tree.
 
 But it does that. This happens when playoutNode is set to its parent  
 and the AMAF values are added again (now for the other color) until  
 the root is reached.

yes, but the weight/color maps stay the same for all updated nodes.

I think the first playoutNode (the one most deep inside the tree) only
should get amaf values for the random playout, the next one form random
playout + from the first playoutNode ... and the root node amaf values
form all the nodes.

I attached code for what I would do, but had no time to test it.

  I think there's a problem with caputres/ko-fights inside the tree.
  All nodes after the capture should get the amaf color/value for the
  stones put there after the capture and not the value of the stones  
  that
  were put there and captured.
 
  Most likely I missed a little piece of code again, but hey, perhaps  
  its
  a real bug.
 
 I did stop to think about ko and 'ishi no shita' (something like  
 'playing under the stones') a little bit but couldn't come to an  
 immediate conclusion what would be the best thing to do. So I decided  
 to leave it as it is. You didn't miss any little piece of code there.  
 Maybe there's room for improvement in case of ko, but my guess is the  
 difference will be minimal at best. If you find it does make a big  
 difference everyone here will be delighted to hear it.
 
 Given how it's performing I doubt there are major problems with my  
 AMAF-RAVE implementation.


regards
  wabu

 
 Mark
 
diff --git a/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java b/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java
index 5e47e00..ee650e1 100644
--- a/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java
+++ b/TesujiRefBot/source/tesuji/games/go/reference/search/MonteCarloTreeSearch.java
@@ -566,6 +566,7 @@ public class MonteCarloTreeSearchMoveType extends Move
 			
 			TreeNodeMonteCarloTreeSearchResultMoveType node = getNodeToExpand(_rootNode, _searchAdministration);
 			TreeNodeMonteCarloTreeSearchResultMoveType playoutNode = node;
+			int start = _searchAdministration.getMoveStack().getSize(); // begin at playoutNode

 			if (node != null)
 {
@@ -580,7 +581,6 @@ public class MonteCarloTreeSearchMoveType extends Move
 				{
 		IntStack playoutMoves = _searchAdministration.getMoveStack();
 		byte color = _monteCarloAdministration.getColorToMove();
-		int start = _monteCarloAdministration.getMoveStack().getSize();
 		int end = playoutMoves.getSize();
 		double weight = 1.0;
 		double weightDelta = 1.0 / (end - start + 1); // Michael Williams' idea to use decreasing weights
@@ -613,6 +613,13 @@ public class MonteCarloTreeSearchMoveType extends Move
 if (_colorMap[xy]==color)
 	result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]);
 			}
+			// add the move to the maps now
+			GoMove move = (GoMove) playoutNode.getContent().getMove();
+			int xy = move.getXY();
+			byte col = move.getColor();
+			_colorMap[xy] = col;
+			_weightMap[xy] = 1.0; //TODO: handle weight
+			
 			playoutNode = playoutNode.getParent();
 		}
 				}
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-15 Thread Mark Boon



On Jan 15, 2009, at 12:33 PM, Daniel Waeber wrote:


yes, but the weight/color maps stay the same for all updated nodes.

I think the first playoutNode (the one most deep inside the tree) only
should get amaf values for the random playout, the next one form  
random

playout + from the first playoutNode ... and the root node amaf values
form all the nodes.


OK, I think I see now what you're trying to say. This is something I  
did think about. I hope my memory serves me well enough to say why I  
didn't do it that way.


- What you propose adds complexity and possibly computation (if it  
means recalculating or adjusting the weight map).

- I don't think it makes all that much difference.

The reason I don't think it makes much difference is that adding AMAF  
values for the moves above (closer to the root) the playoutNode is  
that most likely those points are now occupied. Since the AMAF values  
are used to compute which empty points are the best next candidate,  
the AMAF values at occupied points are immaterial. They are not even  
added. So it only makes a difference in cases where played stones get  
captured and those points then are occupied again. This brings us back  
to the issue discussed earlier about ko and ishi-no-shita.


Mark

P.S. what do I need to open that file? Is it a SVN patch?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-15 Thread Daniel Waeber

On 12:53 Thu 15 Jan , Mark Boon wrote:
 
 On Jan 15, 2009, at 12:33 PM, Daniel Waeber wrote:
 
  yes, but the weight/color maps stay the same for all updated nodes.
 
  I think the first playoutNode (the one most deep inside the tree) only
  should get amaf values for the random playout, the next one form  
  random
  playout + from the first playoutNode ... and the root node amaf values
  form all the nodes.
 
 OK, I think I see now what you're trying to say. This is something I  
 did think about. I hope my memory serves me well enough to say why I  
 didn't do it that way.
 
 - What you propose adds complexity and possibly computation (if it  
 means recalculating or adjusting the weight map).
 - I don't think it makes all that much difference.
 
 The reason I don't think it makes much difference is that adding AMAF  
 values for the moves above (closer to the root) the playoutNode is  
 that most likely those points are now occupied. Since the AMAF values  
 are used to compute which empty points are the best next candidate,  
 the AMAF values at occupied points are immaterial. They are not even  
 added. So it only makes a difference in cases where played stones get  
 captured and those points then are occupied again. This brings us back  
 to the issue discussed earlier about ko and ishi-no-shita.

Yes, I don't know if it would be a big improvement. It is not a real
speed decrease, as you just move the point were you add the additional
values to the maps.
Perhaps I'll run some tests when I have the time.

 Mark
 
 P.S. what do I need to open that file? Is it a SVN patch?

it's standard diff format, outputed by git. You can view it with a text
editor and apply it with unix patch commandline tool or use
eclipse-team-apply-patch. it's done agains svn-root, so you have to ignore
2 leading pathnames if you apply it inside the refbot folder.

regards
  wabu

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-14 Thread Daniel Waeber

Hi,

I'm also interested at the same thing.

 snip
 
  I tried putting this into pseudo code, but it already looks like C. ;p
  http://pastie.org/357231
  If you could look at it, I would be most grateful.
 
 It sounds good but it seems to lack the check of whether a given move is
 first played in a given intersection. When you add that, it because a little
 more tricky to update in the tree. You also update the value of each move
 independently of the color, i.e. a position in which it is black turn will
 be update with white moves. You should restrict the update.

I have a question about the the part were the stats are updated.
(l.15-25). having an array of amaf-values in every node seems very memory
intensive and I don't get how you would access these values.

Something like searching for all nodes below the visited node with the same
pos+color as the ones visited and updating amaf-stats directly in these nodes
would make more sense to me. (but also costs more cpu time)

Hope I don't bring in more confusion into the topic, but I really would
like to get rave done right.

 Hopes that helps,
 Sylvain
 
 
 
 
  -Isaac
 

thanks for you help
  wabu

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-14 Thread Mark Boon



On Jan 14, 2009, at 9:36 AM, Daniel Waeber wrote:


I have a question about the the part were the stats are updated.
(l.15-25). having an array of amaf-values in every node seems very  
memory

intensive and I don't get how you would access these values.


You are right, it is memory intensive. I believe this is one of the  
reasons that most implementations wait a certain number of playouts  
before creating the next level of nodes.


Accessing the AMAF values depends on your implementation. The  
following is a code-snippet from my MCTS reference implementation that  
updates the AMAF values in the tree:


if (_useAMAF)
{
IntStack playoutMoves = 
_searchAdministration.getMoveStack();
byte color = 
_monteCarloAdministration.getColorToMove();
int start = 
_monteCarloAdministration.getMoveStack().getSize();
int end = playoutMoves.getSize();
double weight = 1.0;
double weightDelta = 1.0 / (end - start + 1); // Michael Williams'  
idea to use decreasing weights

GoArray.clear(_weightMap);
GoArray.clear(_colorMap);
for (int i=start; iend; i++)
{
int moveXY = playoutMoves.get(i);
if (_colorMap[moveXY]==0)
{
_colorMap[moveXY] = color;
_weightMap[moveXY] = weight;
}
weight -= weightDelta;
color = opposite(color);
}

while (playoutNode!=null)
{
color = 
opposite(playoutNode.getContent().getMove().getColor());
	boolean playerWins = (blackWins  color==BLACK) || (!blackWins  
 color==WHITE);
	double score = playerWins ?  
MonteCarloTreeSearchResult.MAX_SCORE :  
MonteCarloTreeSearchResult.MIN_SCORE;

for (int i=0; 
iplayoutNode.getChildCount(); i++)
{
		TreeNodeMonteCarloTreeSearchResultMoveType nextNode =  
playoutNode.getChildAt(i);
		MonteCarloTreeSearchResultMoveType result =  
nextNode.getContent();

GoMove move = (GoMove) 
result.getMove();
int xy = move.getXY();
if (_colorMap[xy]==color)
			 
result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]);

}
playoutNode = playoutNode.getParent();
}
}


playoutNode is the move-node from which the playout was done. The amaf  
values are stored in its children by the increaseVirtualPlayout()  
method. Note that it goes up the tree by assigning the parent to  
playoutNode until it gets to the root.


For more context it would be better to lookup the whole source at 
http://plug-and-go.dev.java.net
If you think some more comments in the code could clarify things  
better I'm open to suggestions.


Good luck.

Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-14 Thread Isaac Deutsch

 Hi,
 
 I'm also interested at the same thing.

Glad I'm not alone. ;)

  It sounds good but it seems to lack the check of whether a given move is
  first played in a given intersection. When you add that, it because a
 little
  more tricky to update in the tree.

I see, I missed that. I actually prune second (third, fourth...) moves at the 
same intersection in the playouts. However, this means that I can't just count 
every 2nd move as a move for the given node color, so I'd have to change the 
update loop.

You also update the value of each
 move
  independently of the color, i.e. a position in which it is black turn
 will
  be update with white moves. You should restrict the update.

This I don't really understand. Are you talking about the issue raised by 
basically doing j+=2 in the loop instead of checking whether the move color is 
the same as the node color? If yes, I understand.


 I have a question about the the part were the stats are updated.
 (l.15-25). having an array of amaf-values in every node seems very memory
 intensive and I don't get how you would access these values.
 
 Something like searching for all nodes below the visited node with the
 same
 pos+color as the ones visited and updating amaf-stats directly in these
 nodes
 would make more sense to me. (but also costs more cpu time)

I have been thinking *exactly* the same thing. When you keep the values in the 
node, it's easiest to keep boardsize*boardsize values to be able to access AMAF 
values directly by the childrens' positions. This wastes a bit of memory, but 
not too much. Keeping the value in the children is more cpu expensive but saves 
memory.
For me, the first way seems easier to implement. How would you exactly do 
searching for all nodes below the visited node with the same pos+color as the 
ones visited? I think by the visited node you mean the first move played 
after root, is that correct?

On a side note, I found that using plain AMAF for MCTS (so a single AMAF table 
for all nodes in the tree) actually made my program play about 100 elo *worse*! 
Clearly, it's necessary to implement proper RAVE to get a benefit.

Thanks again,
Isaac
-- 
Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL 
für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-14 Thread Daniel Waeber

Ok, I still have same questions about the refbot code.

On 10:29 Wed 14 Jan , Mark Boon wrote:
 
 On Jan 14, 2009, at 9:36 AM, Daniel Waeber wrote:
 
  I have a question about the the part were the stats are updated.
  (l.15-25). having an array of amaf-values in every node seems very  
  memory
  intensive and I don't get how you would access these values.
 
 You are right, it is memory intensive. I believe this is one of the  
 reasons that most implementations wait a certain number of playouts  
 before creating the next level of nodes.

  form http://pastie.org/pastes/357231 :
  node[visitedNode[i]].AMAFSum[visitedPos[j]]+=result; 
  node[visitedNode[i]].AMAFPlayed[visitedPos[j]]+=1;

I had some problems with these lines. looks to me like the nodes have an
array of boardsize*boardsize inside *all* nodes, increasing the memory
of the tree by a factor 10.
But, correct me if I'm wrong, the refbot code just adds two simple
values to the node for rave.

 Accessing the AMAF values depends on your implementation. The  
 following is a code-snippet from my MCTS reference implementation that  
 updates the AMAF values in the tree:
 
 if (_useAMAF)
 {
   IntStack playoutMoves = _searchAdministration.getMoveStack();
   byte color = _monteCarloAdministration.getColorToMove();
   int start = _monteCarloAdministration.getMoveStack().getSize();
   int end = playoutMoves.getSize();
   double weight = 1.0;
   double weightDelta = 1.0 / (end - start + 1); // Michael Williams' idea 
 to use decreasing weights
   GoArray.clear(_weightMap);
   GoArray.clear(_colorMap);
   for (int i=start; iend; i++)
   {
   int moveXY = playoutMoves.get(i);
   if (_colorMap[moveXY]==0)
   {
   _colorMap[moveXY] = color;
   _weightMap[moveXY] = weight;
   }
   weight -= weightDelta;
   color = opposite(color);
   }

until here it's clear to me.

   while (playoutNode!=null)
   {
   color = opposite(playoutNode.getContent().getMove().getColor());
   boolean playerWins = (blackWins  color==BLACK) || (!blackWins 
  color==WHITE);
   double score = playerWins ? 
 MonteCarloTreeSearchResult.MAX_SCORE : MonteCarloTreeSearchResult.MIN_SCORE;
   for (int i=0; iplayoutNode.getChildCount(); i++)
   {
   TreeNodeMonteCarloTreeSearchResultMoveType nextNode 
 = playoutNode.getChildAt(i);
   MonteCarloTreeSearchResultMoveType result = 
 nextNode.getContent();
   GoMove move = (GoMove) result.getMove();
   int xy = move.getXY();
   if (_colorMap[xy]==color)

 result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]);

if i understand this code correctly, it updates the amaf vaules of all
direct children of the playoutNode according to the weight/color maps.

And that update is done for all nodes on the selected path.

   }
   playoutNode = playoutNode.getParent();

First of all, I miss an weight/colorMap update for xy here. Souldn't the
move of the current playoutNode be considered as an amaf move for all
the nodes below this one?

   }
 }

But, most of all, I miss that the code only updates the amaf values of
all direct children, and not of all nodes n below the playoutNode, where
there is no play at n.move on the path between n and the playoutNode.

Finding all these nodes n would be a costy thing to do, but wouldn't
that be the right thing to do? Implementing a realistic subset of RAVE
is another story, but first of all I want to understand the pure concept
of RAVE.

 playoutNode is the move-node from which the playout was done. The amaf  
 values are stored in its children by the increaseVirtualPlayout()  
 method. Note that it goes up the tree by assigning the parent to  
 playoutNode until it gets to the root.
 
 For more context it would be better to lookup the whole source at 
 http://plug-and-go.dev.java.net
 If you think some more comments in the code could clarify things  
 better I'm open to suggestions.

Thanks for the code, didn't know amaf already is inside the mcts refbot.

 
 Good luck.
 
  Mark

Regards,
   wabu


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-14 Thread Mark Boon



On Jan 14, 2009, at 10:55 PM, Daniel Waeber wrote:




Accessing the AMAF values depends on your implementation. The
following is a code-snippet from my MCTS reference implementation  
that

updates the AMAF values in the tree:

if (_useAMAF)
{
IntStack playoutMoves = _searchAdministration.getMoveStack();
byte color = _monteCarloAdministration.getColorToMove();
int start = _monteCarloAdministration.getMoveStack().getSize();
int end = playoutMoves.getSize();
double weight = 1.0;
	double weightDelta = 1.0 / (end - start + 1); // Michael Williams'  
idea to use decreasing weights

GoArray.clear(_weightMap);
GoArray.clear(_colorMap);
for (int i=start; iend; i++)
{
int moveXY = playoutMoves.get(i);
if (_colorMap[moveXY]==0)
{
_colorMap[moveXY] = color;
_weightMap[moveXY] = weight;
}
weight -= weightDelta;
color = opposite(color);
}


until here it's clear to me.


OK, I hope so. Until here it's pretty much the same as in the original  
AMAF ref-bot without tree-search as defined by Don.




while (playoutNode!=null)
{
color = opposite(playoutNode.getContent().getMove().getColor());
		boolean playerWins = (blackWins  color==BLACK) || (!blackWins  
 color==WHITE);
		double score = playerWins ?  
MonteCarloTreeSearchResult.MAX_SCORE :  
MonteCarloTreeSearchResult.MIN_SCORE;

for (int i=0; iplayoutNode.getChildCount(); i++)
{
			TreeNodeMonteCarloTreeSearchResultMoveType nextNode =  
playoutNode.getChildAt(i);
			MonteCarloTreeSearchResultMoveType result =  
nextNode.getContent();

GoMove move = (GoMove) result.getMove();
int xy = move.getXY();
if (_colorMap[xy]==color)
  
result.increaseVirtualPlayouts(_weightMap[xy]*score,_weightMap[xy]);


if i understand this code correctly, it updates the amaf vaules of all
direct children of the playoutNode according to the weight/color maps.


Yes.




And that update is done for all nodes on the selected path.


Yes, until the root, which is the starting position from which the  
search is performed.






}
playoutNode = playoutNode.getParent();


First of all, I miss an weight/colorMap update for xy here. Souldn't  
the

move of the current playoutNode be considered as an amaf move for all
the nodes below this one?


This is in fact done in this code, see further remarks below.





}
}


But, most of all, I miss that the code only updates the amaf values of
all direct children, and not of all nodes n below the playoutNode,  
where

there is no play at n.move on the path between n and the playoutNode.

Finding all these nodes n would be a costy thing to do, but wouldn't
that be the right thing to do? Implementing a realistic subset of  
RAVE
is another story, but first of all I want to understand the pure  
concept

of RAVE.



You have to understand that the 'start' variable really starts at the  
root from the position for which we do the search. So all the moves  
'below' the playoutNode are also taken into account. The check in the  
earlier part if (_colorMap[moveXY]==0) makes sure there's no move  
between the playoutNode and the n.move as you call it.


That is, if I understand you correctly. Because I'm not quite sure  
what you mean by 'finding all these nodes n'. This is not a sub-set of  
RAVE as I understand it. What you see in the code is the accumulation  
of the AMAF values after each playout. The RAVE value is calculated  
using these values when comparing move-candidates, which is in an  
altogether different place. (The MonteCarloTreeSearchResult  class in  
my project).


I'm afraid I haven't made things much clearer for you.

Mark


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-13 Thread Sylvain Gelly

2009/1/10 Isaac Deutsch i...@gmx.ch

 Hi Sylvain,

 I think it's starting to make sense now. :-)


  Sorry to be unclear. I wish we have a white board where we could discuss
  and
  that would sorted out in a few minutes :).

 Several results turn up in a google search ;p
 http://www.google.com/search?q=online+white+board


  What I tried to mean is that when you do the backup for a given node, you
  look at the part of the playout that happen after that node (including
  that
  node), and you do a normal AMAF backup for that part of the playout.
  Does it make sense?
  I hope we make progress and I am not making things more confusing :).
  I should write a pseudo code I guess, but for today I am too lazy :).
 
  Sylvain

 I tried putting this into pseudo code, but it already looks like C. ;p
 http://pastie.org/357231
 If you could look at it, I would be most grateful.

It sounds good but it seems to lack the check of whether a given move is
first played in a given intersection. When you add that, it because a little
more tricky to update in the tree. You also update the value of each move
independently of the color, i.e. a position in which it is black turn will
be update with white moves. You should restrict the update.

Hopes that helps,
Sylvain




 -Isaac

 --
 Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen:
 http://www.gmx.net/de/go/multimessenger
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-10 Thread Isaac Deutsch

Hi Sylvain,

I think it's starting to make sense now. :-)


 Sorry to be unclear. I wish we have a white board where we could discuss
 and
 that would sorted out in a few minutes :).

Several results turn up in a google search ;p
http://www.google.com/search?q=online+white+board


 What I tried to mean is that when you do the backup for a given node, you
 look at the part of the playout that happen after that node (including
 that
 node), and you do a normal AMAF backup for that part of the playout.
 Does it make sense?
 I hope we make progress and I am not making things more confusing :).
 I should write a pseudo code I guess, but for today I am too lazy :).
 
 Sylvain

I tried putting this into pseudo code, but it already looks like C. ;p
http://pastie.org/357231
If you could look at it, I would be most grateful.

-Isaac

-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] How to properly implement RAVE?

2009-01-09 Thread Isaac Deutsch

I'm a bit confused about the difference between AMAF and RAVE (if there's one).
As far as I understand, with AMAF, you sample in each playout (after it leaves
the tree) the moves played (by both players), but only the first move at any
position, then you reward all moves played either by a win or loss, depending on
their colors.

I tried comparing this to RAVE, as described in many papers. There might be
multiple definitions of RAVE, but the one that seems the most clear to me is
this one (picture used because of math stuff):

http://img352.imageshack.us/img352/443/bild1pb3.png

Is it correct that with RAVE, after a playout (after the tree), only the
siblings of the last node in the tree are updated accordingly? The main
difference to AMAF would be that instead of a single array with values of
AMAFsumReward and AMAFnumPlayed, each node keeps for each of its children
separate variables that hold these values. When a playout is finished, only the
values of these children are updated instead of the single array.

I hope you're able to make any sense out of this, sometimes a text can be
confusing when the writer is confused. ;p

Cheers, ibd
-- 
Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL 
für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-09 Thread Sylvain Gelly

Hi Isaac,
in a nutshell RAVE is basically AMAF adapted for Monte Carlo Tree Search.
The original paper describing it is
http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf and a
paper for broader audience can be found here:
http://www.lri.fr/~gelly/paper/MoGoNectar.pdf (the picture you posted comes
from that paper).

I think your description of RAVE is not completely correct, or at least I
don't understand it completely :). If I understand that sentence only the
siblings of the last node in the tree are updated accordingly, then it is
wrong.

There are two important parts in the algorithms: the backup and the use of
the RAVE value. The second is the hardest to tune and to make it right. The
proposed way of using the values in the original paper is not optimal (while
already very useful). A much better way (especially in 19x19) has been
described on that list by David Silver.
For the backup (as it is your original question), for each node traversed by
the simulation, back up the values exactly as it would be done in AMAF *if*
the playout began at that node. Note that I call playout the whole
simulation starting from the root and going to the end of the game.

Things you have to take care about, for each node, including the root,
including the last node in the tree (most of them obvious, but believe it is
really easy to forget small details and get something totally useless):
  - Count only moves that happen after the node.
  - Count only a move if it is played first on the intersection.
Be careful with captures, especially if they occur within the tree. It is
really easy to mess up the statistics.
  - Count only a move if it is the same color of the node (if in the
position of the node, black is to play, count only black moves for that
node)
  - Count all moves that happen after the node, including those happening
within the tree, and including the move that happen just after the node.

I hope it will help you write a correct implementation. Don't hesitate to
ask for precisions.

Sylvain

2009/1/9 Isaac Deutsch i...@gmx.ch

 I'm a bit confused about the difference between AMAF and RAVE (if there's
 one).
 As far as I understand, with AMAF, you sample in each playout (after it
 leaves
 the tree) the moves played (by both players), but only the first move at
 any
 position, then you reward all moves played either by a win or loss,
 depending on
 their colors.

 I tried comparing this to RAVE, as described in many papers. There might be
 multiple definitions of RAVE, but the one that seems the most clear to me
 is
 this one (picture used because of math stuff):

 http://img352.imageshack.us/img352/443/bild1pb3.png

 Is it correct that with RAVE, after a playout (after the tree), only the
 siblings of the last node in the tree are updated accordingly? The main
 difference to AMAF would be that instead of a single array with values of
 AMAFsumReward and AMAFnumPlayed, each node keeps for each of its children
 separate variables that hold these values. When a playout is finished, only
 the
 values of these children are updated instead of the single array.

 I hope you're able to make any sense out of this, sometimes a text can be
 confusing when the writer is confused. ;p

 Cheers, ibd
 --
 Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL
 für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-09 Thread Isaac Deutsch

Hi Sylvain,

Thanks for your quick answer.


 in a nutshell RAVE is basically AMAF adapted for Monte Carlo Tree Search.
 The original paper describing it is
 http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf and a
 paper for broader audience can be found here:
 http://www.lri.fr/~gelly/paper/MoGoNectar.pdf (the picture you posted
 comes
 from that paper).

Yes, I took a screenshot. Another paper I looked at was
http://www.lri.fr/~teytaud/eg.pdf


 There are two important parts in the algorithms: the backup and the use of
 the RAVE value. The second is the hardest to tune and to make it right.
 The
 proposed way of using the values in the original paper is not optimal
 (while
 already very useful). A much better way (especially in 19x19) has been
 described on that list by David Silver.

Do you mean the calculation of the factor beta that the RAVE value is 
multiplied with?


 For the backup (as it is your original question), for each node traversed
 by
 the simulation, back up the values exactly as it would be done in AMAF
 *if*
 the playout began at that node. Note that I call playout the whole
 simulation starting from the root and going to the end of the game.

I see what you mean with the playout going from the root to the end of the game.
How do you mean back up the values ... if the playout began at that node? 
Since every playout starts
at the root (in my program, the root is the (previous) move of the opponent 
player), wouldn't that mean
only updating the RAVE statistics for the root? I'm sorry if this question 
sounds stupid.


   - Count only moves that happen after the node.

How do you measure if a move is after another move? The amount of moves taken 
from the root (i.e. the depth of the node in the tree/the playout)? Or do you 
mean that the node is effectively a (grand-grand-...) father of the move, so 
the playout has visited that node?


 I hope it will help you write a correct implementation. Don't hesitate to
 ask for precisions.

I really appreciate your help.

-ibd
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
http://www.gmx.net/de/go/multimessenger
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to properly implement RAVE?

2009-01-09 Thread Sylvain Gelly

Hi Isaac,

2009/1/9 Isaac Deutsch i...@gmx.ch

 Hi Sylvain,

 Thanks for your quick answer.


  in a nutshell RAVE is basically AMAF adapted for Monte Carlo Tree Search.
  The original paper describing it is
  http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf and a
  paper for broader audience can be found here:
  http://www.lri.fr/~gelly/paper/MoGoNectar.pdf (the picture you posted
  comes
  from that paper).

 Yes, I took a screenshot. Another paper I looked at was
 http://www.lri.fr/~teytaud/eg.pdf


  There are two important parts in the algorithms: the backup and the use
 of
  the RAVE value. The second is the hardest to tune and to make it right.
  The
  proposed way of using the values in the original paper is not optimal
  (while
  already very useful). A much better way (especially in 19x19) has been
  described on that list by David Silver.

 Do you mean the calculation of the factor beta that the RAVE value is
 multiplied with?


  For the backup (as it is your original question), for each node traversed
  by
  the simulation, back up the values exactly as it would be done in AMAF
  *if*
  the playout began at that node. Note that I call playout the whole
  simulation starting from the root and going to the end of the game.

 I see what you mean with the playout going from the root to the end of the
 game.
 How do you mean back up the values ... if the playout began at that node?
 Since every playout starts
 at the root (in my program, the root is the (previous) move of the opponent
 player), wouldn't that mean
 only updating the RAVE statistics for the root? I'm sorry if this question
 sounds stupid.


Sorry to be unclear. I wish we have a white board where we could discuss and
that would sorted out in a few minutes :).
In the quote of my sentence you did not put the as of the as if the
playout began... (the as and the if where separated by a part of the
sentence, which did not make things clearer, sorry...)
What I tried to mean is that when you do the backup for a given node, you
look at the part of the playout that happen after that node (including that
node), and you do a normal AMAF backup for that part of the playout.
Does it make sense?


- Count only moves that happen after the node.

 How do you measure if a move is after another move? The amount of moves
 taken from the root (i.e. the depth of the node in the tree/the playout)? Or
 do you mean that the node is effectively a (grand-grand-...) father of the
 move, so the playout has visited that node?

By after I mean after in the sequence.
If the playout is E5, A7, C4, D8, by after I mean that C4 is after E5, but
not after D8.

I hope we make progress and I am not making things more confusing :).
I should write a pseudo code I guess, but for today I am too lazy :).

Sylvain

 I hope it will help you write a correct implementation. Don't hesitate to
  ask for precisions.

 I really appreciate your help.

 -ibd
 --
 Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen:
 http://www.gmx.net/de/go/multimessenger
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

74 matches

Mail list logo