I am very enthusiastic about CLOP tuning. I overcame some roadblocks along the way that I want to share with you.
You want CLOP to optimize strength, but it is actually optimizing Strength + Luck + Avoidance + Exploitation. Using CLOP effectively requires mitigating the last three factors. BTW, I imagine that "CLOP" could be any fully automated parameter tuning solution. That is, nothing here is really specific to CLOP. It just happens that CLOP is the first fully automated parameter tuning system that I have made to work. LUCK ---- Remi has diagnosed the Luck factor: the win rate of the optimal setting is probably overestimated. This is not a big deal, provided that tuning runs go long enough for average and optimum win rates to be close together. (E.g., a few rating points.) If you change parameters to the ones recommended by CLOP, then the next CLOP run might claim that your program is weaker than before. This is just the luck vanishing, so I am not tempted to revert parameter settings in such cases. A more subtle point is that that CLOP does not measure the parameter combination that it recommends. The recommendation is a weighted average of points that it has measured, which converges (we think) to the optimum when the win-rate is a smooth function of the parameters. Actual performance can vary from projected, especially if the win-rate is not a smooth function. For example, Pebbles tunes against 2 opponents using 105 starting positions that are played with each color, making 420 initial situations. Both opponents and Pebbles are pseudo-random, so there is a great deal of variety from each initial situation. Still, you can imagine that the win rate is not entirely a smooth function of parameters. Maybe if you change a parameter by a small amount, then 10 similar initial situations will switch from wins to losses. That is just a reflection of how tuning is performed, so I just accept the recommended changes. Doing anything else would drive me crazy. Instead, I just accept that additional tuning will probably improve results. Basically: do sufficiently long runs (maybe 20K to 30K games when tuning 2 or 3 parameters?) until Average < Optimal < Average + K rating points (maybe K = 5?). And then blindly accept the new parameters without worrying about it. AVOIDANCE --------- "Avoidance" means: your program has weaknesses, and CLOP can tune parameters so that weaknesses are less likely to trigger. Avoidance makes your program stronger to some extent. That is, by avoiding weaknesses you can play better. But there are obvious limitations, as you cannot expect opponents to cooperate, but your search engine will use your own play to model the opponent. Pebbles has about 60 parameters, and I tuned them in groups of 2 or 3 parameters for about a dozen runs. Pebbles win rate rose from 47% to 55%. I was supremely happy, because that would be a good year and CLOP did it in just a month. But then I integrated some bug fixes, and the win rate dropped to 48%. What went wrong? For several weeks I was convinced that I had broken something, but I was unable to find anything. I verified every change using diffs, and I restored parameters, but was unable to make the win rate return to 56%. What I think happened is that tuning had tweaked Pebbles out to such an extent that it was now very sensitive to perturbations. There are 420 starting situations and ~60 parameters, so tuning each parameter just has to "switch" a few wins to push the win rate really high. I had fixed a few bugs that I discovered while the tuning was going on, and that was enough of a perturbation to make the win rate plummet. Figuring this out took a long time, so I now have a "don't make yourself crazy" policy here, too. I fix bugs as I find them, and integrate bug fixes into the tuning version ASAP. Now if the win rate drops suddenly, then there is an excellent chance that my last change was incorrect. But it hasn't been very long, so that's easy to find. BTW, tuning makes bugs easier to find. Your program is usually operating with carefully selected parameters. The tuning process creates an altogether different distribution of positions, which tends to expose logical errors. The result is that I have an endless supply of bugs to fix. EXPLOITATION ------------ Exploitation: tuning will create situations that the opposition handles badly. There is the same potential for good and bad as Avoidance, but the potential is reduced by having multiple training opponents. Pebbles trains against two opponents, and I will add others as Windows builds become available. Is it true that Pachi was tuned against Fuego, which was tuned against Mogo, which was tuned against GnuGo? If so, then game theory suggests that tuning against all of them will make Pebbles less susceptible to Exploitation and Avoidance defects. Using some self-play games in tuning should also help reduce both Avoidance and Exploitation. I have not tested that because using self-play in CLOP requires some additional features. RESULTS ------- Pebbles is winning almost 57% of its tuning games at present, just 2 months after initiating CLOP tuning. For comparison: for about 2 years I basically did not tune parameters at all. Then I spent about 6 months trying to tune parameters using semi-manual methods, which raised win-rate from 43% to 47%. You can see why I am enthusiastic about this new way of working: 1/3 of the time, and 3 times the benefit. Note that finding bugs is a large part of the benefit. Not just finding, but fixing and then tuning. All three processes are aided (i.e., faster and more effective) by integrating into an automated tuning loop. Hope this helps, Brian -----Original Message----- From: computer-go-boun...@dvandva.org [mailto:computer-go-boun...@dvandva.org] On Behalf Of Rémi Coulom Sent: Tuesday, January 03, 2012 10:19 AM To: computer-go@dvandva.org Subject: [Computer-go] win rate bias and CLOP It is important to understand that CLOP claims very little in terms of win rate. That is to say the win rate estimates it reports are all biased. Win rate over all samples underestimates the real win rate. Win rate near the maximum (central, and weighted) tend to be over-estimated. CLOP finds the location in parameter space that has the highest win rate. It may be the highest because it is the best, but also because it is the most lucky. That's why it is necessarily biased toward optimistic values. If the win rate over all samples is an improvement, then you can be sure you have an improvement. Otherwise you cannot be sure unless you actually play a lot of games with the suggested parameters. Rémi On 3 janv. 2012, at 14:09, Ingo Althöfer wrote: > Hi David, > > David Fotland on CLOP-optimization: >> I tried it, but got no benefit so far. It claimed to find better settings >> for most parameters, but when I used them the program wasnt any >> stronger. > > Interestant. Had it similar strength or did it even become weaker? > How often did the move proposals by your "older" ManyFaces and the > CLOP-MF differ? > > Ingo. > -- > NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie! > Jetzt informieren: http://www.gmx.net/de/go/freephone > _______________________________________________ > Computer-go mailing list > Computer-go@dvandva.org > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@dvandva.org http://dvandva.org/cgi-bin/mailman/listinfo/computer-go