By the way,  I never experimented with the formula I use,  I just plagiarized it from information posted here.   I'm working on my own fomula in the meantime and used to use something different I made up.   

What I used before worked like this:

   u = (w + n) / (g + n)

 
  u  - is what you maximize, same as ucb value
   w - wins for this move
   g - games for this move
   n - a constant related to how many times parent visited

n increases as the number of parent visits increase.  I tried log(parent_count) but it increases too slow.  I tried lot's of things here.  

When you compare this to the standard formula,  it tends to be too exploitive, and not very explorative.   If you use a constant for N,  you get a search which eventually focuses on a single move only - because other moves can never "catch up" in score (unless the current move is discovered to have problems.)

- Don



Don Dailey wrote:
Tim Foden wrote:
  
Don Dailey wrote:
    
I suggest
exactly 25,000 play-outs that we should standardize on.    50,000 will
tax my spare computer which I like to use for modest CGOS tests. 
If it is agreed,  I will start a 25k test.    My prediction is that this
will finish around 1600 ELO on CGOS.     
      
OK, I added Fluke to this (25k) test (twice), before I saw the later
comment about using 10k too.

Its looking like your drdGeneric 25k bot is currently around 1475 (147
games).

Fluke on the other hand looks to be settling at around 1300 (125
games).  I feel that I've probably got a problem in my
implementation!  :)  (I've felt this for some time actually -- UCT
never seemed to work well for me at all.)

Details of Fluke's UCT + Random playouts.

1. UCT constant, c = 0.25.  e.g. UCB value = averageScore + c *
sqrt(log(n)/m).
2. New children are created once a node is visited 1 time (URd) or 2
times (UR2).
3. Eye rule for random playouts:
  * Solid eyes (all 4 from same group).
  * False non-solid eyes (at least 50% of corners are of opposite
colour).
4. Choosing legal moves for playouts:  1st probe is random, then scan.

Is there anything else that's likely to be significant here?
    
 1.  My UCT constant is 1.0  - my formula is  averageScore + c * sqrt(
(2.0 * log(n)) / (10.0 * m) );
 2.  New children are created when parent exceeds 100 visits.
 3.  I think the eye rule is the same (you state it differently, but I
believe it's the same.)
 4.  playouts are truly uniform random - yours are not. 

I think point 4 could be significant but I can't be sure.

- Don





  
I guess I'll let it play some more games and see where it ends up.

Cheers, Tim.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

    
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

  
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to