Measuring small differences is a big problem for me. I would like to have better tools here.
For instance, I am trying to measure whether a particular rule is an improvement, where with the rule it wins 60.5%, and without 60.0%. You need a staggering number of games to establish confidence. Yet this is the small, 5 to 10 Elo gain that Don referred to. I hoped to isolate cases where the *move* differs between versions, and then analyze (perhaps using a standard oracle like Fuego) whether those moves are plusses or minuses. But this is MCTS, and the program does not always play the same way even in the same position. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of "Ingo Althöfer" Sent: Thursday, April 07, 2011 12:25 PM To: Aja; [email protected] Subject: Re: [Computer-go] 7.0 Komi and weird deep search result Hello Aja, > ... Also, I will be more careful in measuring the > improvement, as exampled a lot in your description (my supervisor > Remi Coulom also repeatedly corrects me at this point). Remi was a strong computer chess programmer ("The Crazy Bishop") before he went to computer go. There is for instance a photo of him at the WCCC 2004, in the middle of the report http://www.chessbase.de/newsroom2.asp?newsid=3350 Ingo. > Aja > ----- Original Message ----- > From: Don Dailey > To: Aja > Cc: [email protected] > Sent: Wednesday, April 06, 2011 9:51 PM > Subject: Re: [Computer-go] 7.0 Komi and weird deep search result > > > > > On Wed, Apr 6, 2011 at 3:32 AM, Aja <[email protected]> wrote: > > Hi Don, > Thanks for your penetrating ideas. Yes, I would like to reconsider my > feeling and hope that it doesn’t misguide anyone. > > We both know the recent controversy between Fruit and Rybka (or Fabien > and Vasik), but of course it’s not the issue here right now. Just want > to mention in passing that Fabien said he might develop a Go program in the > next few years, so we can expect for another open-source strong program. > > > I hope he does, but of course I did not bring this up to talk about the > controversy, just the reality that computer chess software is marching on > at a remarkable pace and this was an excellent example to illustrate that. > > > It’s just my guess that it’s very hard for current MCTS to surpass > amateur 5d or 6d. One main reason is it’s difficult to solve a lot of > different semeai and life-and-death instances in pro level, even if the > program is running on a super big hardware (by this point I was impressed by > Olivier’s talk in a conference of Taiwan, in which he gave an “easy” > semeai example that Mogo cannot solve with very larger number of simulations). > > > I want to point out that in computer chess that this same exact thing > was often done not so many years ago. A relatively simple position would be > presented that humans easily understood, but seemed completely out of > reach for computer chess programs to understand. It was easy to see that > computers would need some ridiculous breakthroughs to be able to understand > such positions and the conclusion was that computers probably would never be > close to the top humans in chess. > > > It's my view that such illustrations tended to cause people to draw the > wrong conclusions and sent people off in the wrong direction, looking for > non-existent breakthroughs and concluding that incremental progress was a > completely foolish way to proceed. > > > I believe we (as humans) lack a bit of imagination when it comes to > these sort of things. For example the 4 minute mile was consider > physiologically unattainable a few years before the first one was run - in > other words > it was hard to imagine it ever happening. It's often difficult for us to > imagine things that are too different from what we are currently > experiencing (especially once we decide it is "hard.") Maybe part of the > problem is that we live in an instant gratification society and no longer > think > in terms of hard work and gradual progress, we want an instant > "breakthrough." > > > Progress is a funny thing if you put numbers on it. If you get 1%, it > doesn't seem like hardly anything. But if you add 1% to that, then 1% again, > it's like compound interest in a bank and you look back over just a few > of these and are surprised by how much progress you make. > > > I have been surprised that in chess the point of diminishing returns is > farther away that it seems and I'm sure in GO it is even more so by a large > degree. In other words ELO progress in software has been more or less > steady, not slowing to a crawl. Yes, it is punctuated with small spikes > but seen over anything more than a couple of years it's remarkably smooth. > As evidence of that, the program Houdini recently was released that is at > least 50 ELO over it's nearest competitor, but you can be sure that is > only a temporary situation - it will look like a weak program in 2 or 3 > years. > > Another aspect is that it’s extremely hard for MCTS to > consider/argue for few points in early stages on 19x19 (because it only sees > winning > rate and dynamic komi is far from enough to fix it) and that is exactly what > pros are very able to. > > > The only thing you are telling me is that we picked a hard problem. > There is nothing here inherently unsolvable, we are just impatient and > cannot imagine (yet) how we are going to solve this. > > > I have discovered that in computer chess (which I have been into for > decades) the "unsolvable" problems didn't really make that much difference in > the short term. The solutions come at a natural rate and until programs > get a lot better in other areas you will find that some of these "glaring" > weaknesses do not make much different in terms of how strong the program > is at the moment even when it seems like its a huge deal. These > weaknesses gradually start making a huge difference when the program is > really good > and we tend to judge programs more by their weaknesses than their > strengths. So when we see something "ugly" it makes us think the program > cannot > be as strong as it actually has proved to be. And computer program have > strengths and weakness in different proportions than we do so this tends to > distort our own views of how good or bad they play. > > > An example in computer chess is basic endgame knowledge. It's really > ugly to see a chess program trade down from a won ending to a draw because > it doesn't understand that certain simple endings cannot be won despite > being a piece up. Years ago, after seeing glaring example of this horrible > weakness, I took some time and implemented a large number of scoring > corrections to deal with this as well as putting in king and pawn versus king > perfect > play database. I patted myself on the back and expect to see a decent > ELO gain. However even on modern programs this probably does not add more > than 2 or 3 ELO and I'm being generous. If you show a grandmaster some > of these glitches he might conclude that your program plays like an amateur > (in fact when programs first became master strength many strong human > players would see one of the "ugly" moves and conclude that the program could > not play a move like that and even be "expert" strength, let alone master > strength.) > > > I'm not saying these are not real problem in computer go, but the point > is that there a large number of problems that altogether define exactly > where we stand right now and we just have to start making dents (which we > actually have been doing to a remarkable degree if you would only look more > carefully.) The bigger problems are just going to take longer to fix than > the lesser problems. Also, I believe we have to get over this notion > that we have to "fix it" completely. We probably will not fix it suddenly > with a one line program change, but we can and will find ways to minimize > the problems, and it may be gradual and incremental. > > > In your example you rightly note that program do very well when in their > "sweet spot", when there are clearly defined goals that affect winning > percentages. In computer chess it used to be believed that no amount of > searching could improve the programs "horrible" positional play and that > computers only played well if there were immediate tactical considerations, > otherwise they quickly went wrong. That turned out not to be true, it > was just not clearly understood at the time because we were looking at the > problem through our own biased eyes and seeing the ugly things. The truth > of the matter is that the tree search and playouts works well in all > positions but some more than others and we will find ways to clearly improve > the > situation in the future with incremental progress (not major > breakthroughs.) Also, we have some clearly wrong things that we will fix > (like eye > definitions we have are approximations and are sometimes broken.) > > > I'll say it again, I think computer go is still in it's infancy and we > are still looking for the big fixes and have not yet come to fully > appreciate the immense practical power of incremental improvements over time. > > When the problem looks big we feel like small improvements are a waste of time > but nothing is farther from the truth. > > > > > The progress in hardware by Mogo, Fuego and pachi is well-known and > impressive, so that I don’t think the amazing progress in computer Go is > mainly due to software. Both hardware and software are important in making a > strong Go program for now, as far as I can see. > > > I think the pattern is the same as it happened in computer chess. But > I personally believe that software will be a much bigger contributor to > progress in the future (even if you ignore the "slowdown" of Moores law.) > > > > > I hope your prediction is right: “without anything really major (but > no doubt some new small ideas) we are going to see your KGS 5 and 6 dan > and much higher in 5 to 10 years.” If not, then we will have a lot of > “interesting” work to do, no matter testing methodology, engineering or > academic etc. > > > I am convinced I am right on this one. I have absolutely nothing > against finding major breakthroughs of course but I think what we will call > "breakthroughs" will be things that add up to 50 ELO or less. In chess it > was things like check extensions, null move pruning, futility pruning, LMR > etc. We called null move pruning a "major breakthrough" but when it was > first used it added something like 40 or 50 ELO to the strength of a chess > program. Is that a major breakthrough? You don't notice 50 ELO right > away by watching it play because it still will lose 43 percent of the games > and thus still get outplayed in many games, but It depends on your > definition of "major" I guess. In go that would be something like 1/2 > dan. > I do think there will be a few of these kinds of breakthroughs. What > happens is that these good ideas need to get refined and improved too. I > think we get much more out of null move pruning than we used to. LMR when > first implemented does not give chess program hardly any gain until it's > done just right. But when refined it's pretty huge. My first LMR > implementation was only about 20, now it's like 100 ELO or more. > > > Thanks for listening to me ramble on about this ... > > > Don > > > > Aja > -- GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit gratis Handy-Flat! http://portal.gmx.net/de/go/dsl _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
