To effectively apply RAVE data to the playouts, we must come up with a new idea to incorporate the information of move sequences into RAVE. The main weakness of the AMAF principle is its lack of sequential consideration. And that is NOT an easy problem.
In the attached example, W's winning moves are the sequence E9->C9->A9. Unless the playout have some knowledge of semeai, it's hard to help it with solely RAVE statistics: 1. The playouts with W playing at A9 should have 100% winning rate (if W always plays C8 in response to B's B8). Unfortunately, A9 is currently W's illegal move and not considered by RAVE. 2. C9 is W's legal move. The RAVE value of W's C9 could be high since W is able to reach A9 after C9 is played. However, C9 is currently an useless self-atari. It is good only after E9 is played. It is not likely that the standard RAVE could tell us E9 is more urgent than C9, and C9 is more urgent than A9 (by contrast, A9 has the highest winning rate though it's an illegal move), because the statistics are accumulated without considering move sequences. Aja 2013/3/29 Francois van Niekerk <[email protected]> > The idea sounds pretty much like PoolRave proposed in "Biasing > Monte-Carlo Simulations through RAVE Values" by Rimmel et al. > -- > Francois van Niekerk > Email: [email protected] | Twitter: @francoisvn > Cell: +2784 0350 214 | Website: http://leafcloud.com > > > On 29 March 2013 19:46, Peter Drake <[email protected]> wrote: > > The "Last Good Reply" approach is similar (although not identical) to > this. > > We (Orego) got an improvement from it. Some others have, some haven't. > > > > https://webdisk.lclark.edu/drake/publications/baier-drake-ieee-2010.pdf > > > > > > On Fri, Mar 29, 2013 at 10:40 AM, Alexander Kozlovsky > > <[email protected]> wrote: > >> > >> Hi! > >> > >> I know that RAVE data typically used during tree traversing. > >> But is it possible to use it during random playout, in order to > >> increase playout quality? > >> > >> On the first sight it seems as dangerous idea, because > >> RAVE statistics are incrementally gathered from the same > >> playouts, and this can lead to problematic positive feedback > >> loop, as in saying "The rich get richer and the poor get poorer". > >> That is, random initial fluctuation can get stronger with time > >> and statistics become skewed, because good moves which > >> receive unfortunate initial RAVE data will be ignored > >> in future random playout. > >> > >> But what if we see move selection during random playout > >> as a typical multiarm bandit problem? Then the algorithm > >> of next playout move selection can be the next: > >> > >> 1) select several (say, 4) valid candidate moves for the playout. > >> > >> 2) choose the next move using multiarm bandit formula. > >> We can do this, because for each candidate move we > >> know (a) number of rave wins for this move, (b) number > >> of playouts with this move, (c) total number of playouts > >> (all of this numbers are tied to current UCT node) > >> > >> I think, this should add exploration element to next move > >> selection and prevent skewing of RAVE statistics. > >> I suspect using RAVE data can improve playout strength > >> significantly. > >> > >> Has anybody trying something like this, or it is just crazy idea? > >> > >> _______________________________________________ > >> Computer-go mailing list > >> [email protected] > >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > > > > > > > > -- > > Peter Drake > > https://sites.google.com/a/lclark.edu/drake/ > > > > _______________________________________________ > > Computer-go mailing list > > [email protected] > > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
