Just to be clear, the whole point of what I said is that it's easier to quantify the quality of the scores returned by the search than to quantify the quality of the moves. I know people have used problem sets before with little success, and what I am suggesting is different, even though it might sound similar.
Álvaro. On Thu, Apr 7, 2011 at 4:48 PM, Don Dailey <[email protected]> wrote: > > > On Thu, Apr 7, 2011 at 4:27 PM, terry mcintyre <[email protected]> > wrote: >> >> To address Don's points, here are a few ideas: >> >> One, pick situations where the choice of good moves is limited. These will >> probably be "all or nothing" battles. If you pick move A, you win. >> Otherwise, you lose, tough luck. > > The test will have some value, but this does not simulate what actually > happens in games, only a subset of that. It may be for example that in > positions where there are 10 plausible moves is where a lot of the ELO > comes from. I don't really know, this is just an example. > My own speculation on this is that by far the most relevant test is the > kinds of moves you need to AVOID playing. Go (and just about every other > game) is more about avoiding stupid moves and errors than it is in finding > some profound moves. You don't "produce" wins, you take them when there > is opportunity and opportunity is provided by your opponent. > Of course the program has to be skilled enough to find good moves when they > exist, but the biggest problem in computer go is weak moves by the computer > (where weak means giving the opponent opportunities.) > The computer goes wrong by far the most where there is not a specific goal > or target that it can understand - that's when it plays a "weak" move that a > strong player will pounce on. > In tennis, I try to think in terms of getting the ball over the net so that > it's the opponents turn to make a mistake. This is an oversimplification, > but it is essentially how it works. Each point in tennis is lost by one > of the players, it's never "won" but of course we try to hit great shots so > that the opponent has a much greater chance to "lose" the point. However > trying to hit those great shots is what causes you to lose a lot of points > :-) > Don > > >> >> Two, test for proper followup. It's not enough to determine A, then forget >> about the battle and play somewhere in another corner. If this is an >> all-or-nothing battle, the program must continue to play the proper response >> to every threat, until the job is done. >> >> For extra credit, test what happens if the opponent continues to thrash >> after the situation becomes hopeless - the program should not waste a move >> in that area if it is not needed, but should take sente elsewhere. >> >> In short, the program must master not only move A, but B, C, D, E, ... Z - >> and one should test different move orders for the opponent, including feints >> in other parts of the board. >> >> Having mastered a few such test cases, now introduce a ko or seki to make >> the analysis a bit more complex. Rinse and repeat. >> >> Terry McIntyre <[email protected]> >> >> Unix/Linux Systems Administration >> Taking time to do it right saves having to do it twice. >> >> ________________________________ >> From: Don Dailey <[email protected]> >> To: [email protected] >> Sent: Thu, April 7, 2011 3:33:36 PM >> Subject: Re: [Computer-go] Evaluating improvements differently >> >> >> >> On Thu, Apr 7, 2011 at 2:27 PM, Álvaro Begué <[email protected]> >> wrote: >>> >>> Hi, >>> >>> I haven't spent any time in go programming recently, but a few months >>> ago I thought of a method to evaluate proposed improvements that might >>> be much better than playing a gazillion games. A search results in two >>> things: A move and a probability of winning (or a score that can be >>> mapped into a probability of winning, but let's ignore that issue for >>> now). Evaluating whether the moves picked by a strategy are good is >>> really hard, but evaluating whether the estimate of a probability of >>> winning is a good estimate seems much easier. >>> >>> For instance, take a database of games played by strong players. >>> Extract a few positions from each game. Run your engine for some fixed >>> amount of time on each position, and measure how well it predicted the >>> winner after each position (cross entropy is probably the correct >>> measure to use). Do this before and after the proposed modification >>> and compare. >>> >>> Of course one has to be careful to pick reasonably well-played games >>> (games played by top engines with more time per move than you'll use >>> to evaluate your engine seems good enough, and will result in a much >>> cleaner database than collecting games played by humans) and to have a >>> large enough and varied enough set of positions. Also, one should >>> worry about over-fitting for those particular positions, but one could >>> use another set of positions for confirmation. These problems all seem >>> manageable to me. >>> >>> It is possible that certain improvements can really only be measured >>> by playing games (time control comes to mind), but I would imagine >>> that for a large class of things this procedure can give you useful >>> results in much more reasonable times. >>> >>> Your thoughts are appreciated. >> >> I think this should be tried, but I also believe you will be bitterly >> disappointed. The holy grail for me is to find some way to quickly test >> versions and I have found that NO test does that better than playing games. >> The problem is that a real game is full of complex variables that >> interact in strange and wonderful ways. >> The reason a program wins or loses games is not as simple as just saying >> it played a bad move, or it failed to see this or that . It's rather more >> like the stock market. Have you noticed that at the end of the day >> someone reports on the stock market and always gives a reason why this or >> that stock (or the entire nasdaq) rose or fell? They like to pretend >> they understand why but in reality it's just a wild guess. It's that >> way with why you win and lose games. The reasons are very complicated - >> you cannot just count the good moves and bad moves to get a global "score" >> nor can you construct a good problem suite that will tell you how good your >> program is (unless you are willing to accept about 2 or 3 Dan worth of >> error.) >> I am always interested in ideas like what you propose here and I think it >> should be experimented with - I'm just very skeptical. Been there, done >> that. >> Years ago there were tests based on counting how many moves a player made >> that matched games of top grandmasters. You would hide the answer, pick a >> move, then see if it matched the move "Bobby Fischer" would play. Then >> your skill would be computed based on how many of these you got "right." >> However recent studies show that 2 equally stronger players (computers in >> this case) can vary significantly in choice of moves and in fact a >> similarity test was constructed that showed that different equally strong >> programs played much more differently than 2 versions of the SAME program >> where one version was older and significantly weaker. So the choice of >> move had a lot more to do with style than it did strength. That is the >> very opposite of most peoples intuition. >> A program can play very well but be weak because of the occasional >> horrible move, or it can play very "average" without making major errors >> but still missing a lot of "brilliancies." There is complex interaction >> here that determines the actual RESULTS it will get in competition. >> However I still like your idea and if you (or anyone else) tries it I am >> very interested in your experiences with it. I am always hoping something >> better will come along ... >> By the way, there is also the notion of weighted problem sets. I think >> your idea has similarities. You try to solve problems but some problems >> get much different "scores" than others. This is an attempt to improve >> naive problem testing. Another idea that has been tried is to test as many >> programs as possible of widely varying strength on a zillion problems and >> then fit a scoring method that predicts strength based on this. >> Don >> >> >> >> >> >>> >>> Álvaro. >>> _______________________________________________ >>> Computer-go mailing list >>> [email protected] >>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >> >> >> >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
