Re: [Computer-go] Understanding and implementing RAVE

2015-10-27 Thread Urban Hafner
On Tue, Oct 27, 2015 at 6:57 AM, David Fotland 
wrote:

> Many Faces uses 2200 for RAVE_EQUIV.  I found that anything between 2000
> and 3000 was about the same, and CLOP recommended 2200.  1000 was a little
> worse, and 500 was much worse.  In discussions with other programmers I
> heard numbers between  and 5000.
>
>
>
> For parameter tuning I recommend Emi’s CLOP.
>

Thanks David. I ended up with a RAVE_EQUIV of 20 now. But as there are so
many possible variables that can be used for tuning who knows what that
means. And using CLOP is high on my list. A few things need to be done
first (namely making everything configurable) but yes, it’s finally time!

Urban
-- 
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-27 Thread Urban Hafner
On Mon, Oct 26, 2015 at 5:51 PM, Gonçalo Mendes Ferreira 
wrote:

Well I took the liberty to steal and very crudely modify a MCTS diagram
> from wikipedia:
>
> http://pwp.net.ipl.pt/alunos.isel/35393/mcts.png
>
> Maybe with images it is clearer. You seem to be using an acyclic graph
> with pointers for the arcs. When you need to find transpositions, and free
> malloc'd nodes because you're out of memory, I find my solution much more
> practical because all the information for selecting what play to make is in
> the current node. But Pachi is no small program so I'm sure I'm wrong
> somewhere.
>

I think I understand how that works. At least roughly. I believe we just
approach things from a different angle. I use Ruby professionally to write
web apps so it’s totally fine to have slow code (given a database AND an
internet connection the language speed doesn’t really matter) so I’m used
to writing code that is easy to understand. Maybe one day I’ll have to
improve this here (after all 100pps aren’t that good) but for now I’ll
focus on implementing everything that Michi does. There’s still so much to
do feature wise and that’s so much more fun. :)

Urban
-- 
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Understanding and implementing RAVE

2015-10-26 Thread Urban Hafner
With the help of Michi  (thank you Petr!)
I’m currently working on adding RAVE to my UCT tree search. Before I get
too deep into it I’d like to make sure I actually understand it correctly.
It would be great if you could have a quick look at my pseudo code (mostly
stolen from michi).

Give a Node with the fields

* color
* visits,
* wins
* amaf_visits
* amaf_wins

The tree is updated after a playout in the following way:

We traverse the tree according to the moves played. visits gets incremented
unconditionally, and wins gets incremented if the playout was a win for
color. That is the same as UCT.

Then we have a look at the children of the node and increment amaf_visits
for the children if color of that particular (child) node was the first to
play on that intersection in the playout. If the playout was also a win for
the (child) node then we also increment amaf_wins.


Then we also need to change the formula to select then next node. I must
admit I just copied the one from Michi (RAVE_EQUIV = 3500. Stolen from
Michi):

win_rate = wins / plays (assumes plays will never be 0)
if amaf_plays == 0 {
  return win_rate
} else {
  rave_winrate = amaf_wins / amaf_plays
  beta = amaf_plays / ( amaf_plays + plays + plays * amaf_plays /
RAVE_EQUIV)
  return beta * rave_winrate + (1 - beta) * winrate
}

Obviously I’m not expecting anyone to actually check the formula, but it
would be great if I could get a thumbs up (or down) on the general idea.

Cheers,

Urban
-- 
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread David Fotland
Many Faces only has big nodes with all of the child statistics in one node, 
along with the totals for the position.  Like the right hand of your diagram, 
but also with the 11/22 totals.  There is no tree.  All nodes are in a big 
transposition table and there are no child or parent pointers.  I did this for 
performance (cache locality), and to avoid the complexity of having both a 
transposition table and a tree to maintain.

David

> -Original Message-
> From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf
> Of Petr Baudis
> Sent: Monday, October 26, 2015 10:56 AM
> To: computer-go@computer-go.org
> Subject: Re: [Computer-go] Understanding and implementing RAVE
> 
> On Mon, Oct 26, 2015 at 04:51:39PM +, Gon alo Mendes Ferreira wrote:
> > >I must admit that I don t quite follow how you ve implemented things,
> > >but then you seem to be further along than me as I haven t even
> > >started with transposition tables.
> > >
> > >Urban
> > >
> > Well I took the liberty to steal and very crudely modify a MCTS
> > diagram from
> > wikipedia:
> >
> > http://pwp.net.ipl.pt/alunos.isel/35393/mcts.png
> >
> > Maybe with images it is clearer. You seem to be using an acyclic graph
> > with pointers for the arcs.
> 
> Ah. In Pachi, at one point I wanted to rewrite things this way to get
> awesome cache locality, but it was such a dull work that I gave up. :-)
> 
> So I wanted to do it when I was writing this again in Michi, but this
> approach breaks encapsulation (you can't just pass around a node object
> reference, but need a (parent, coord) tuple) which makes the code a bit
> dense and clumsy, so I decided it's not very pedagogical to do it this
> way...
> 
> > When you need to find transpositions, and free malloc'd nodes because
> > you're out of memory, I find my solution much more practical because
> > all the information for selecting what play to make is in the current
> > node. But Pachi is no small program so I'm sure I'm wrong somewhere.
> 
> To clarify, Pachi and Michi do not use transposition tables but a simple
> tree structure.  Transpositions (and meaningfully explored transpositions
> at that) are relatively rare in Go, plus there are pitfalls when
> propagating updates, aren't there?  OTOH having a fixed-size list of
> followups carries some overhead when the board is getting filled up. So I
> never bothered to do this; but I know that some other strong programs do
> use transposition tables.
> 
> --
>   Petr Baudis
>   If you have good ideas, good data and fast computers,
>   you can do almost anything. -- Geoffrey Hinton
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread David Fotland
Many Faces uses 2200 for RAVE_EQUIV.  I found that anything between 2000 and 
3000 was about the same, and CLOP recommended 2200.  1000 was a little worse, 
and 500 was much worse.  In discussions with other programmers I heard numbers 
between  and 5000.

 

For parameter tuning I recommend Emi’s CLOP.

 

David

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Urban Hafner
Sent: Monday, October 26, 2015 8:38 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Understanding and implementing RAVE

 

On Mon, Oct 26, 2015 at 1:12 PM, Petr Baudis <pa...@ucw.cz> wrote:

 

This RAVE formula has been derived by David Silver as follows
(miraculously hosted by Hiroshi-san):

http://www.yss-aya.com/rave.pdf

Also note that RAVE_EQUIV (q_{ur} * (1-q_{ur}) / b_r^2) varies widely
among programs, IIRC; 3500 might be on the higher end of the spectrum,
it makes the transition from AMAF to true winrate fairly slow.  You
typically set the value by parameter tuning.

 

Thanks for the link. That’s actually understandable to me. :) It seems that 
very soon there will be no easy wins anymore and I will have to invest the 
resources into some parameter tuning.

 

Urban

-- 

Blog: http://bettong.net/

Twitter: https://twitter.com/ujh

Homepage: http://www.urbanhafner.com/

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread Petr Baudis
On Mon, Oct 26, 2015 at 11:36:51AM +0100, Urban Hafner wrote:
> With the help of Michi  (thank you Petr!)
> I’m currently working on adding RAVE to my UCT tree search. Before I get
> too deep into it I’d like to make sure I actually understand it correctly.
> It would be great if you could have a quick look at my pseudo code (mostly
> stolen from michi).
> 
> Give a Node with the fields
> 
> * color
> * visits,
> * wins
> * amaf_visits
> * amaf_wins
> 
> The tree is updated after a playout in the following way:
> 
> We traverse the tree according to the moves played. visits gets incremented
> unconditionally, and wins gets incremented if the playout was a win for
> color. That is the same as UCT.
> 
> Then we have a look at the children of the node and increment amaf_visits
> for the children if color of that particular (child) node was the first to
> play on that intersection in the playout. If the playout was also a win for
> the (child) node then we also increment amaf_wins.

Sounds right.

> Then we also need to change the formula to select then next node. I must
> admit I just copied the one from Michi (RAVE_EQUIV = 3500. Stolen from
> Michi):
> 
> win_rate = wins / plays (assumes plays will never be 0)
> if amaf_plays == 0 {
>   return win_rate
> } else {
>   rave_winrate = amaf_wins / amaf_plays
>   beta = amaf_plays / ( amaf_plays + plays + plays * amaf_plays /
> RAVE_EQUIV)
>   return beta * rave_winrate + (1 - beta) * winrate
> }
> 
> Obviously I’m not expecting anyone to actually check the formula, but it
> would be great if I could get a thumbs up (or down) on the general idea.

This RAVE formula has been derived by David Silver as follows
(miraculously hosted by Hiroshi-san):

http://www.yss-aya.com/rave.pdf

Also note that RAVE_EQUIV (q_{ur} * (1-q_{ur}) / b_r^2) varies widely
among programs, IIRC; 3500 might be on the higher end of the spectrum,
it makes the transition from AMAF to true winrate fairly slow.  You
typically set the value by parameter tuning.

-- 
Petr Baudis
If you have good ideas, good data and fast computers,
you can do almost anything. -- Geoffrey Hinton
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread Gonçalo Mendes Ferreira

I think we mean the same. So when black plays at a1 first we count it for
black everywhere but never for white. Correct?
The question is, if both black and white play a1 - because there were 
captures in the middle - whether both of them will update a1 in some of 
their nodes. I believe so.

Ah yes, the UCB term. I’ve now added it. I dimly remember now that Pachi
doesn’t use it but yes I’ve now added it. And I’m not sure I understand you
other comment. My nodes store the visits/etc, the move made, and references
to all children.
I've checked and that's with Pachi/Michi do too. And it's the closest to 
the MCTS diagrams and explanations that are published around. But 
instead you could just have the information on transitions from a state, 
and not wins/visits of the state per se. I'm not very familiar with 
Pachi (I couldn't find how it implements a transpositions table, or how 
it prunes it with memory safety).


I use a structure with
color
visits[19][19]
wins[19][19]
amaf_visits[19][19]
amaf_wins[19][19]
etc

They are very similar, but my node information doesn't need to consult 
the transpositions table or follow points to select the next play. On 
the other hand sequences of play that arrive on the same node have the 
exact same "quality" regardless of sequence of play in Pachi, where in 
my program the sequences of play only "unite" when following that state 
forward.


I don't know if this difference is very small or if I'm forgetting 
something important. Maybe Pachi implements discovering if a state has 
already been expanded in a very efficient way.


I don't know how other programs do it, but instinctively, if you started 
by implementing a non-UCT MCTS algorithm, the information you kept would 
be just the initial state and statistics on the transitions, so my train 
of thought seems (to me) the logical follow up.


Gonçalo F.
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread Urban Hafner
On Mon, Oct 26, 2015 at 1:20 PM, Gonçalo Mendes Ferreira 
wrote:

> I'm not sure about "if color of that particular (child) node was the first
> to play on that intersection in the playout". What I think the authors
> meant was to only increment the AMAF statistic once because of ko
> repetitions. So, repeatable at different states, regardless of color. Now
> I'm not so sure.
>

I think we mean the same. So when black plays at a1 first we count it for
black everywhere but never for white. Correct?


> Other than that you're missing the UCB part of the formula because
> Pachi/Michi doesn't use one. I also prefer storing in the node the
> visits/wins/amaf_visits/amaf_wins of every (legal) transition, and you seem
> to store that relative to the state. That should save space but how does it
> fare in traversing the legal plays to select the transition in UCT?
>

Ah yes, the UCB term. I’ve now added it. I dimly remember now that Pachi
doesn’t use it but yes I’ve now added it. And I’m not sure I understand you
other comment. My nodes store the visits/etc, the move made, and references
to all children.

Urban
-- 
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread Urban Hafner
On Mon, Oct 26, 2015 at 1:12 PM, Petr Baudis  wrote:

This RAVE formula has been derived by David Silver as follows
> (miraculously hosted by Hiroshi-san):
>
> http://www.yss-aya.com/rave.pdf
>
> Also note that RAVE_EQUIV (q_{ur} * (1-q_{ur}) / b_r^2) varies widely
> among programs, IIRC; 3500 might be on the higher end of the spectrum,
> it makes the transition from AMAF to true winrate fairly slow.  You
> typically set the value by parameter tuning.
>

Thanks for the link. That’s actually understandable to me. :) It seems that
very soon there will be no easy wins anymore and I will have to invest the
resources into some parameter tuning.

Urban
-- 
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread Petr Baudis
On Mon, Oct 26, 2015 at 04:51:39PM +, Gonçalo Mendes Ferreira wrote:
> >I must admit that I don’t quite follow how you’ve implemented things, but
> >then you seem to be further along than me as I haven’t even started with
> >transposition tables.
> >
> >Urban
> >
> Well I took the liberty to steal and very crudely modify a MCTS diagram from
> wikipedia:
> 
> http://pwp.net.ipl.pt/alunos.isel/35393/mcts.png
> 
> Maybe with images it is clearer. You seem to be using an acyclic graph with
> pointers for the arcs.

Ah. In Pachi, at one point I wanted to rewrite things this way to get
awesome cache locality, but it was such a dull work that I gave up. :-)

So I wanted to do it when I was writing this again in Michi, but this
approach breaks encapsulation (you can't just pass around a node object
reference, but need a (parent, coord) tuple) which makes the code a bit
dense and clumsy, so I decided it's not very pedagogical to do it this
way...

> When you need to find transpositions, and free
> malloc'd nodes because you're out of memory, I find my solution much more
> practical because all the information for selecting what play to make is in
> the current node. But Pachi is no small program so I'm sure I'm wrong
> somewhere.

To clarify, Pachi and Michi do not use transposition tables but
a simple tree structure.  Transpositions (and meaningfully explored
transpositions at that) are relatively rare in Go, plus there are
pitfalls when propagating updates, aren't there?  OTOH having
a fixed-size list of followups carries some overhead when the board is
getting filled up. So I never bothered to do this; but I know that some
other strong programs do use transposition tables.

-- 
Petr Baudis
If you have good ideas, good data and fast computers,
you can do almost anything. -- Geoffrey Hinton
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread Urban Hafner
On Mon, Oct 26, 2015 at 5:03 PM, Gonçalo Mendes Ferreira 
wrote:

> I think we mean the same. So when black plays at a1 first we count it for
>> black everywhere but never for white. Correct?
>>
> The question is, if both black and white play a1 - because there were
> captures in the middle - whether both of them will update a1 in some of
> their nodes. I believe so.


I see. Yes, that is also an option. I’ve only checked Michi and it only
counts the first one. It’s possible that this is just out of performance
considerations, but it could also be argued that so many captures happen in
the playouts that the information at the end of the game (where captures
are more likely to happen) isn’t worth recording.


>
> Ah yes, the UCB term. I’ve now added it. I dimly remember now that Pachi
>> doesn’t use it but yes I’ve now added it. And I’m not sure I understand
>> you
>> other comment. My nodes store the visits/etc, the move made, and
>> references
>> to all children.
>>
> I've checked and that's with Pachi/Michi do too. And it's the closest to
> the MCTS diagrams and explanations that are published around. But instead
> you could just have the information on transitions from a state, and not
> wins/visits of the state per se. I'm not very familiar with Pachi (I
> couldn't find how it implements a transpositions table, or how it prunes it
> with memory safety).
>
> I use a structure with
> color
> visits[19][19]
> wins[19][19]
> amaf_visits[19][19]
> amaf_wins[19][19]
> etc
>
> They are very similar, but my node information doesn't need to consult the
> transpositions table or follow points to select the next play. On the other
> hand sequences of play that arrive on the same node have the exact same
> "quality" regardless of sequence of play in Pachi, where in my program the
> sequences of play only "unite" when following that state forward.
>
> I don't know if this difference is very small or if I'm forgetting
> something important. Maybe Pachi implements discovering if a state has
> already been expanded in a very efficient way.
>
> I don't know how other programs do it, but instinctively, if you started
> by implementing a non-UCT MCTS algorithm, the information you kept would be
> just the initial state and statistics on the transitions, so my train of
> thought seems (to me) the logical follow up.
>

I must admit that I don’t quite follow how you’ve implemented things, but
then you seem to be further along than me as I haven’t even started with
transposition tables.

Urban
-- 
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread Gonçalo Mendes Ferreira

I must admit that I don’t quite follow how you’ve implemented things, but
then you seem to be further along than me as I haven’t even started with
transposition tables.

Urban

Well I took the liberty to steal and very crudely modify a MCTS diagram 
from wikipedia:


http://pwp.net.ipl.pt/alunos.isel/35393/mcts.png

Maybe with images it is clearer. You seem to be using an acyclic graph 
with pointers for the arcs. When you need to find transpositions, and 
free malloc'd nodes because you're out of memory, I find my solution 
much more practical because all the information for selecting what play 
to make is in the current node. But Pachi is no small program so I'm 
sure I'm wrong somewhere.


Gonçalo
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Understanding and implementing RAVE

2015-10-26 Thread Petr Baudis
On Mon, Oct 26, 2015 at 05:29:57PM +0100, Urban Hafner wrote:
> On Mon, Oct 26, 2015 at 5:03 PM, Gonçalo Mendes Ferreira 
> wrote:
> 
> > I think we mean the same. So when black plays at a1 first we count it for
> >> black everywhere but never for white. Correct?
> >>
> > The question is, if both black and white play a1 - because there were
> > captures in the middle - whether both of them will update a1 in some of
> > their nodes. I believe so.
> 
> 
> I see. Yes, that is also an option. I’ve only checked Michi and it only
> counts the first one. It’s possible that this is just out of performance
> considerations, but it could also be argued that so many captures happen in
> the playouts that the information at the end of the game (where captures
> are more likely to happen) isn’t worth recording.

I agree.

It may be worthwhile to think a bit abstractly about what's our goal
with the AMAF statistics - we want to notice plays that are important
for winning and therefore should be prioritized to play right away.
Should we prioritize a move that we otherwise get around to only under
the stones?

Also, if you step through a bunch of MC simulations, you may notice that
often a large+strong group gets captured because of something silly and
then totally unrelated moves appear in the freed up space.  Playing
under the stones is thus a strong indicator that we are in the "nonsense
endgame" phase of MC.

-- 
Petr Baudis
If you have good ideas, good data and fast computers,
you can do almost anything. -- Geoffrey Hinton
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go