To effectively apply RAVE data to the playouts, we must come up with a new
idea to incorporate the information of move sequences into RAVE. The main
weakness of the AMAF principle is its lack of sequential consideration. And
that is NOT an easy problem.

In the attached example, W's winning moves are the sequence E9->C9->A9.
Unless the playout have some knowledge of semeai, it's hard to help it with
solely RAVE statistics:

1. The playouts with W playing at A9 should have 100% winning rate (if W
always plays C8 in response to B's B8). Unfortunately, A9 is currently W's
illegal move and not considered by RAVE.

2. C9 is W's legal move. The RAVE value of W's C9 could be high since W is
able to reach A9 after C9 is played. However, C9 is currently an useless
self-atari. It is good only after E9 is played.

It is not likely that the standard RAVE could tell us E9 is more urgent
than C9, and C9 is more urgent than A9 (by contrast, A9 has the highest
winning rate though it's an illegal move), because the statistics are
accumulated without considering move sequences.

Aja

2013/3/29 Francois van Niekerk <[email protected]>

> The idea sounds pretty much like PoolRave proposed in "Biasing
> Monte-Carlo Simulations through RAVE Values" by Rimmel et al.
> --
> Francois van Niekerk
> Email: [email protected] | Twitter: @francoisvn
> Cell: +2784 0350 214 | Website: http://leafcloud.com
>
>
> On 29 March 2013 19:46, Peter Drake <[email protected]> wrote:
> > The "Last Good Reply" approach is similar (although not identical) to
> this.
> > We (Orego) got an improvement from it. Some others have, some haven't.
> >
> > https://webdisk.lclark.edu/drake/publications/baier-drake-ieee-2010.pdf
> >
> >
> > On Fri, Mar 29, 2013 at 10:40 AM, Alexander Kozlovsky
> > <[email protected]> wrote:
> >>
> >> Hi!
> >>
> >> I know that RAVE data typically used during tree traversing.
> >> But is it possible to use it during random playout, in order to
> >> increase playout quality?
> >>
> >> On the first sight it seems as dangerous idea, because
> >> RAVE statistics are incrementally gathered from the same
> >> playouts, and this can lead to problematic positive feedback
> >> loop, as in saying "The rich get richer and the poor get poorer".
> >> That is, random initial fluctuation can get stronger with time
> >> and statistics become skewed, because good moves which
> >> receive unfortunate initial RAVE data will be ignored
> >> in future random playout.
> >>
> >> But what if we see move selection during random playout
> >> as a typical multiarm bandit problem? Then the algorithm
> >> of next playout move selection can be the next:
> >>
> >> 1) select several (say, 4) valid candidate moves for the playout.
> >>
> >> 2) choose the next move using multiarm bandit formula.
> >> We can do this, because for each candidate move we
> >> know (a) number of rave wins for this move, (b) number
> >> of playouts with this move, (c) total number of playouts
> >> (all of this numbers are tied to current UCT node)
> >>
> >> I think, this should add exploration element to next move
> >> selection and prevent skewing of RAVE statistics.
> >> I suspect using RAVE data can improve playout strength
> >> significantly.
> >>
> >> Has anybody trying something like this, or it is just crazy idea?
> >>
> >> _______________________________________________
> >> Computer-go mailing list
> >> [email protected]
> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> >
> >
> >
> > --
> > Peter Drake
> > https://sites.google.com/a/lclark.edu/drake/
> >
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to