From: Brian Sheppard <[email protected]>

> Measuring small differences is a big problem for me. I would like to have 
>better tools here.

> For instance, I am trying to measure whether a particular rule is an 
>improvement, where with the rule it wins 60.5%, and without 60.0%. You need a 
>staggering number of games to establish confidence. Yet this is the small, 5 
>to 
>10 Elo gain that Don referred to.

> I hoped to isolate cases where the *move* differs between versions, and then 
>analyze (perhaps using a standard oracle like Fuego) whether those moves are 
>plusses or minuses. But this is MCTS, and the program does not always play the 
>same way even in the same position.

A very tough problem! How many is "a staggering number", just out of curiosity?

I believe at least one developer is using a network of idle workstations to run 
tests. Is anybody using Amazon or some other cloud service? I recently read 
where a firm rented 10,000 cores for 8 hours for $8000 - a princely sum, but it 
does scale down as well as up.

Sadly, Fuego ( or any existing program ) may not be a very good "oracle" to 
determine whether move A or move B is best in a given situation.

Does anybody have experience with testing particular "hard cases", rather than 
"1000 random games from scratch"?

That is, based on past experience, program X did move A in situation Y, which 
turned out to be a disaster. Strong players suggest that B, C, or D would be 
better. 


There are more than a few such "what was the player thinking?" instances in the 
archives.


      
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to