Re: [computer-go] Transpositions in Monte-carlo tree search

2009-04-01 Thread Matthew Woodcraft
Erik van der Werf wrote:
  Jonas Kahn wrote:
 No there is no danger. That's the whole point of weighting with N_{s,a}.

 N_{s,a} = number of times the node s has been visited, starting with parent
 a.

 You can write Value of a node a = (\sum_{s \in sons} N_{s,a} V_s) / (\sum
 N_{s,a})

 where V_s is ideally the «true» value of node s.
 In UCT2, they use V_s = Q_{s,a} the win average of simulations going
 through a, and then through s.
 In UCT3, they use V_s = Q_s the win average of all simulations through
 s.

 There is a danger. The problem is that the selection policy also
 implements the soft-max like behavior that ensures convergence to the
 minimax result. If the you backup to all possible parents, including
 those for which the child would have been an inferior choice, you may
 get into trouble.

That's what I was worried about.

But I think it's ok the way Jonas describes above: you don't add
anything to the false-parent node's simulation count, and you don't
change the weight of the false-child in its value; you just change the
evaluation of the false-child.

(This means that the effect of backing up to alternate parents will be
smaller than the effect of backing up to the 'true' parent, which is
presumably part of the reason why this variant is less attractive.)

-M-
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Transpositions in Monte-carlo tree search

2009-04-01 Thread Erik van der Werf
On Wed, Apr 1, 2009 at 9:03 PM, Matthew Woodcraft
matt...@woodcraft.me.uk wrote:
 Erik van der Werf wrote:
  Jonas Kahn wrote:
 No there is no danger. That's the whole point of weighting with N_{s,a}.

 N_{s,a} = number of times the node s has been visited, starting with parent
 a.

 You can write Value of a node a = (\sum_{s \in sons} N_{s,a} V_s) / (\sum
 N_{s,a})

 where V_s is ideally the «true» value of node s.
 In UCT2, they use V_s = Q_{s,a} the win average of simulations going
 through a, and then through s.
 In UCT3, they use V_s = Q_s the win average of all simulations through
 s.

 There is a danger. The problem is that the selection policy also
 implements the soft-max like behavior that ensures convergence to the
 minimax result. If the you backup to all possible parents, including
 those for which the child would have been an inferior choice, you may
 get into trouble.

 That's what I was worried about.

 But I think it's ok the way Jonas describes above: you don't add
 anything to the false-parent node's simulation count, and you don't
 change the weight of the false-child in its value; you just change the
 evaluation of the false-child.

 (This means that the effect of backing up to alternate parents will be
 smaller than the effect of backing up to the 'true' parent, which is
 presumably part of the reason why this variant is less attractive.)


Ok, but I would not call that a back up; nothing goes up to the
alternative parents. Unless I missed something, with this you only
make adjustments to the statistics representing transposed occurrences
of the same position.  I don't see that this is how we should
interpret UCT3.

Erik
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Transpositions in Monte-carlo tree search

2009-03-31 Thread Matthew Woodcraft
Jonas Kahn wrote:
 You might be interested by this article, for a very complete and tested
 answer. Plus the idea of grouping, but a good part of the effect seems
 to me to be giving a heuristic pre-value to moves, which might be done more
 efficiently otherwise:

 eprints.pascal-network.org/archive/4571/01/8057.pdf

Thank you (and to the others who replied).

The idea of backing a simulation's results up to all parents ('UCT3' in
that paper) seems very dangerous to me! It's a shame they didn't have
any Go results to show for that one.

-M-
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Transpositions in Monte-carlo tree search

2009-03-31 Thread Jonas Kahn

On Tue, 31 Mar 2009, Matthew Woodcraft wrote:


Jonas Kahn wrote:

You might be interested by this article, for a very complete and tested
answer. Plus the idea of grouping, but a good part of the effect seems
to me to be giving a heuristic pre-value to moves, which might be done more
efficiently otherwise:

eprints.pascal-network.org/archive/4571/01/8057.pdf


Thank you (and to the others who replied).

The idea of backing a simulation's results up to all parents ('UCT3' in
that paper) seems very dangerous to me! It's a shame they didn't have
any Go results to show for that one.


No there is no danger. That's the whole point of weighting with N_{s,a}.

N_{s,a} = number of times the node s has been visited, starting with parent a.

You can write 
Value of a node a = (\sum_{s \in sons} N_{s,a} V_s) / (\sum N_{s,a})


where V_s is ideally the «true» value of node s.
In UCT2, they use V_s = Q_{s,a} the win average of simulations going
through a, and then through s.
In UCT3, they use V_s = Q_s the win average of all simulations through
s.

Assuming Markovianity (1), Q_s is a random variable with same mean as Q_{s,a}, but lower variance. 
That's all.


Jonas
(1) This might be broken if you give a heuristic value to your move in
the tree based on how near it is to previous moves, but that's not
really important.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Transpositions in Monte-carlo tree search

2009-03-30 Thread Matthew Woodcraft
How are transpositions normally handled in monte-carlo tree search?

I have been assuming that the natural thing would be to have a single
shared node for each board position, so that simulations which reach the
same position will use the same set of statistics (but when backing up
the result, to only update the nodes for the simulation actually
played).

But I see in some of the Mogo papers that some of the contributions to
the heuristic value of a node depend on the position of the previous
move.

So do MCTS programs not recognise transpositions at all? Or are the
heuristics from the time when the node was first created allowed to
stand, no matter what the simulation route is next time?

-M-
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Transpositions in Monte-carlo tree search

2009-03-30 Thread Jason House
Which Mogo paper(s)? Not all Mogo papers contain ideaa used by Mogo in  
public releases / competitions. Some things are just research. I  
remember hearing that the grandfather heuristic isn't good.


Sent from my iPhone

On Mar 30, 2009, at 5:36 PM, Matthew Woodcraft  
matt...@woodcraft.me.uk wrote:



How are transpositions normally handled in monte-carlo tree search?

I have been assuming that the natural thing would be to have a single
shared node for each board position, so that simulations which reach  
the

same position will use the same set of statistics (but when backing up
the result, to only update the nodes for the simulation actually
played).

But I see in some of the Mogo papers that some of the contributions to
the heuristic value of a node depend on the position of the previous
move.

So do MCTS programs not recognise transpositions at all? Or are the
heuristics from the time when the node was first created allowed to
stand, no matter what the simulation route is next time?

-M-
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Transpositions in Monte-carlo tree search

2009-03-30 Thread Sylvain Gelly
Hi Mattew,

I cannot answer for the current version of Mogo but I can for the one
1.5 years ago. Maybe it still holds.
We had a transposition table as it was designed like that from the
beginning. However the prior value of the node, was initialized when
the node was created and indeed was depending of the previous move
played in the current sequence.
In practice, I think it does not change much.

Sylvain

On Mon, Mar 30, 2009 at 11:36 PM, Matthew Woodcraft
matt...@woodcraft.me.uk wrote:
 How are transpositions normally handled in monte-carlo tree search?

 I have been assuming that the natural thing would be to have a single
 shared node for each board position, so that simulations which reach the
 same position will use the same set of statistics (but when backing up
 the result, to only update the nodes for the simulation actually
 played).

 But I see in some of the Mogo papers that some of the contributions to
 the heuristic value of a node depend on the position of the previous
 move.

 So do MCTS programs not recognise transpositions at all? Or are the
 heuristics from the time when the node was first created allowed to
 stand, no matter what the simulation route is next time?

 -M-
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Transpositions in Monte-carlo tree search

2009-03-30 Thread Jonas Kahn

You might be interested by this article, for a very complete and tested
answer. Plus the idea of grouping, but a good part of the effect seems
to me to be giving a heuristic pre-value to moves, which might be done more
efficiently otherwise:

eprints.pascal-network.org/archive/4571/01/8057.pdf

Jonas

On Mon, 30 Mar 2009, Matthew Woodcraft wrote:


How are transpositions normally handled in monte-carlo tree search?

I have been assuming that the natural thing would be to have a single
shared node for each board position, so that simulations which reach the
same position will use the same set of statistics (but when backing up
the result, to only update the nodes for the simulation actually
played).

But I see in some of the Mogo papers that some of the contributions to
the heuristic value of a node depend on the position of the previous
move.

So do MCTS programs not recognise transpositions at all? Or are the
heuristics from the time when the node was first created allowed to
stand, no matter what the simulation route is next time?

-M-
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/