On Thu, 2008-08-28 at 09:38 +0200, Rémi Coulom wrote: > Don Dailey wrote: > > I don't really believe the ELO model is "very wrong." I only believe > > it is a mathematical model that is "somewhat" flawed for chess and > > presumable also for other games. Do you have an alternative that might > > be more accurate? > > > > > > - Don > > I don't have very precise data about computer vs human. But in computer > vs computer, I believe some of the data of your scalability study > clearly demonstrated that strength of MoGo against Leela scaled > differently from the strength of MoGo against GNU. But my memory of that > data may not be very good.
Yes, I think you are remembering the FatMan Mogo study where Fatman didn't scale nearly as well with time. The idea that I'm resistant to is NOT that some programs scale better than others, I take that as a given. In the graph, it was very clear that Mogo scaled better at higher levels, I don't care about that. What I am asserting is that the 2600 version of FatMan in the study is going to play about 2600 ELO strength against every opponent, not just Mogo. And the stronger Mogo version will on average tend to be stronger against all other opponents. If you could somehow magically place a bunch of humans of widely varying strengths in this study, as a kind of control, it would have a relatively minor affect on the 2 respective lines. I guess my assertion is that Occam's Razor should be applied here until we have really good reason to believe differently. If the study shows 2600 ELO we do not need to explain it away by some more complicated theory. If my ego were hurt by the fact that Mogo scales better, I could easily construct a theory that explained it away. This is what we tend to do when we don't want to believe something. That's what I think is being done with the argument that improvement against computers doesn't translate to improvement against humans. Sort of a false modesty - when I beat a much stronger player once I made excuses for him, my brain was not ready to fully accept the win. I am probably older than many here (I'm 52) and I have some historical perspective to look back on with computer chess. Unfortunately I don't have anything in print to back this up going back 30 years, but I know the general feeling many had is that computer chess scalability and skill doesn't scale to humans. In one of the earliest scalability studies it was seen that improvement is pretty remarkable at increasing depths of search and I do remember this being explained away - a remark to the effect that if this is to be believed, then in a few years we would have a computer grandmaster and it being laughed off so to speak. Guess what happened? I would like to put this to rest (and be proved correct) but unfortunately I don't have any proof of my assertions either. It comes down to an opinion. I'm willing to be proved wrong but we all know how difficult it is to get reliable evidence from human played games. It really would require thousands if not millions of fairly played human computer games at a huge variety of controlled levels and conditions to make sense of this. The scalability study I did involved levels that cannot be practically played against humans in quantity. Making a fair study against humans is real messy and the fact that humans adapt to specific opponents better than humans will tend to suppress the ratings of the machines artificially. Humans also try harder against stronger opponents, which also skews the results. Something interesting happened in computer chess about 20 years ago, give or take. There was a sudden explosion of interest in computer chess when stand-alone chess computers started getting strong enough and popular enough that a lot of chess players would purchase them. They started getting official ratings and playing in tournaments. But people very quickly learned how to play against them (someone even wrote a book about how to be computers) and it was if the progress of the computers stalled. We "knew" that computers were continuing to improve but it wasn't "translating" against humans. Again, a few were postulating that they had reached a plateau and that from now on you would only see very minor progress. There was something like a 200 ELO barrier that computers had to overcome to make up for the fact that most chess players no longer feared the computers and knew how to play them. But it was a local phenomenon. It was a temporary glitch. There have always been certain players who specialize in beating computers and a style developed where you could get into a closed position and play a slow developing sneak attack against the computers king side. This style worked against most of the weaknesses that computers had. It created some serious intransitivities such that sometimes a low rated player would be unusually good at beating higher rated computers. As far as I know there was no general solution that fixed this, computers simply kept improving to the point where their strengths dominated the game. And of course over time as computers search deeper and deeper this weakness also gradually goes away. You almost always have to make some compromise to get the kind of game you want (like in GO high handicap games where KIM said you HAVE to overplay) and thus this intransitivity was a local phenomenon, it applies mostly to computer less than 2400 ELO in strength for instance. This is why I currently believe these things are temporary and local. You cannot continue to get better without discarding some of your weaknesses. And of course your strengths can be used to neutralize your weaknesses, like the tennis player who has such a strong forehand that attacks and the opponent cannot get to his weak backhand. Take aways his strengths, and his weaknesses start to become big factors. Rémi, I know that you are a bit of an expert on rating systems and I make heavy use of your bayeselo program - a beautiful tool. I wish you would produce a command line version that didn't require interactivity. Anyway, have you ever considered a rating system that could somehow measure strength in more than 1 dimension? I suppose a huge problem would be getting statistically reasonable samples, but it would be pretty cool if you could detect the rocks, scissors, paper effect and given enough data be able to predict performance against particular styles of players. That would require a system that used more than just a single value to specify playing ability. It might be that if you assumed just 2 or 3 dimensions instead of 1, it could be a major improvement. Perhaps most players make use of 2 or 3 major cognitive abilities to play go? I know in chess I encountered superior players who could not calculate as well as I could, but they were clearly better players than I. And there were others just the opposite, seemed to know nothing about chess but were so tactically tenacious that they were difficult to beat. - Don > > Rémi > _______________________________________________ > computer-go mailing list > [email protected] > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
