On Thu, 2007-07-26 at 21:43 +0900, Darren Cook wrote:
> > The statement "will never give a strong computer go program."  is rather
> > devoid of meaning.  You either should define "strong" ...
> 
> OK, I'll add something. By strong I mean dan level.
> 
> > I definitely agree that once you've played a few thousand uniformly
> > random games, there is little to be gained by doing a few thousand more.
> > And as an evaluation function this is a relatively weak one - although
> > surprisingly good in some ways it has definite limitations.    AnchorMan
> > hits the wall at about 5,000 simulations and it is uniformly random with
> > no other search involved.   It would not be much stronger even with
> > infinite number of simulations.  
> 
> 5000 is a fascinating number. You cannot be talking about UCT playouts,
> as I know you know strength always increases with more playouts. But, if
> you are talking about playouts as an evaluation function, in my
> experiments there was practically no gain in accuracy beyond 60
> playouts, and even 30 was enough to get a good approximation.

Actually, I'm not being accurate here.  5000 play-outs using a
modification of all-as-first is about as good as it gets for AnchorMan.
But it's measurably better than 2500 play-outs for instance.

There is no tree search using this method.  I just play these 5000 games
randomly and look to see which moves were included the most for the
winning side.  

60 is preposterous.  You are clearly doing something differently, or
have a broken algorithm or a really good algorithm.

I also found that if you just treat MC play-outs as an evaluation
function on top of a tree search,  more simulation is better.   

> I guess our results are so different as I concentrated on the end game?

In the endgame, less simulations probably give the right answer more
often.

> 
> > The way to think about a play-out policy is to ask, "how good would it
> > be given an infinite number of simulations?"   The answer for uniform
> > random is, "not very."   
> 
> I did not mention it in the article, as it wasn't related to my main
> point, but when I've been testing playout algorithms I've been measuring
> the result as 5 sets of 20 playouts, then remembering the worst score of
> the 5 sets. The difference in accuracy between worst set of 20 and all
> 100 playouts I've been calling the stability: a small difference is a
> stable algorithm, and is highly desirable as then I know I can get a
> reliable estimate with fewer playouts.

But I'm measuring based on actual game playing performance.  


> Darren
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to