On Thu, 2008-08-28 at 09:38 +0200, Rémi Coulom wrote:
> Don Dailey wrote:
> > I don't really believe the ELO model is "very wrong."   I only believe
> > it is a mathematical model that is "somewhat" flawed for chess and
> > presumable also for other games.   Do you have an alternative that might
> > be more accurate?   
> >
> >
> > - Don
> 
> I don't have very precise data about computer vs human. But in computer 
> vs computer, I believe some of the data of your scalability study 
> clearly demonstrated that strength of MoGo against Leela scaled 
> differently from the strength of MoGo against GNU. But my memory of that 
> data may not be very good.

Yes, I think you are remembering the FatMan Mogo study where Fatman
didn't scale nearly as well with time.  

The idea that I'm resistant to is NOT that some programs scale better
than others,  I take that as a given.   

In the graph, it was very clear that Mogo scaled better at higher
levels,  I don't care about that.  What I am asserting is that the 2600
version of FatMan in the study is going to play about 2600 ELO strength
against every opponent, not just Mogo.  And the stronger Mogo version
will on average tend to be stronger against all other opponents.  If you
could somehow magically place a bunch of humans of widely varying
strengths in this study, as a kind of control,  it would have a
relatively minor affect on the 2 respective lines.  

I guess my assertion is that Occam's Razor should be applied here until
we have really good reason to believe differently.   If the study shows
2600 ELO we do not need to explain it away by some more complicated
theory.    If my ego were hurt by the fact that Mogo scales better, I
could easily construct a theory that explained it away.  This is what we
tend to do when we don't want to believe something.    That's what I
think is being done with the argument that improvement against computers
doesn't translate to improvement against humans.   Sort of a false
modesty - when I beat a much stronger player once I made excuses for
him, my brain was not ready to fully accept the win.       

I am probably older than many here (I'm 52) and I have some historical
perspective to look back on with computer chess.  Unfortunately I don't
have anything in print to back this up going back 30 years,  but I know
the general feeling many had is that computer chess scalability and
skill doesn't scale to humans.  In one of the earliest scalability
studies it was seen that improvement is pretty remarkable at increasing
depths of search and I do remember this being explained away - a remark
to the effect that if this is to be believed, then in a few years we
would have a computer grandmaster and it being laughed off so to speak.
Guess what happened?   

I would like to put this to rest (and be proved correct) but
unfortunately I don't have any proof of my assertions either.  It comes
down to an opinion.   I'm willing to be proved wrong but we all know how
difficult it is to get reliable evidence from human played games.  It
really would require thousands if not millions of fairly played human
computer games at a huge variety of controlled levels and conditions to
make sense of this.   The scalability study I did involved levels that
cannot be practically played against humans in quantity.    Making a
fair study against humans is real messy and the fact that humans adapt
to specific opponents better than humans will tend to suppress the
ratings of the machines artificially.   Humans also try harder against
stronger opponents, which also skews the results.  

Something interesting happened in computer chess about 20 years ago,
give or take.  There was a sudden explosion of interest in computer
chess when stand-alone chess computers started getting strong enough and
popular enough that a lot of chess players would purchase them.  They
started getting official ratings and playing in tournaments.   But
people very quickly learned how to play against them (someone even wrote
a book about how to be computers)  and it was if the progress of the
computers stalled.   We "knew" that computers were continuing to improve
but it wasn't "translating" against humans.  Again, a few were
postulating that they had reached a plateau and that from now on you
would only see very minor progress.    There was something like a 200
ELO barrier that computers had to overcome to make up for the fact that
most chess players no longer feared the computers and knew how to play
them.   But it was a local phenomenon.   It was a temporary glitch.

There have always been certain players who specialize in beating
computers and a style developed where you could get into a closed
position and play a slow developing sneak attack against the computers
king side.   This style worked against most of the weaknesses that
computers had.   It created some serious intransitivities  such that
sometimes a low rated player would be unusually good at beating higher
rated computers.   As far as I know there was no general solution that
fixed this,  computers simply kept improving to the point where their
strengths dominated the game.  And of course over time as computers
search deeper and deeper this weakness also gradually goes away.   You
almost always have to make some compromise to get the kind of game you
want (like in GO high handicap games where KIM said you HAVE to
overplay)  and thus this intransitivity was a local phenomenon, it
applies mostly to computer less than 2400 ELO in strength for
instance.    

This is why I currently believe these things are temporary and local.
You cannot continue to get better without discarding some of your
weaknesses.  And of course your strengths can be used to neutralize your
weaknesses, like the tennis player who has such a strong forehand that
attacks and the opponent cannot get to his weak backhand.    Take aways
his strengths, and his weaknesses start to become big factors.     

Rémi, I know that you are a bit of an expert on rating systems and I
make heavy use of your bayeselo program - a beautiful tool.  I wish you
would produce a command line version that didn't require interactivity.
Anyway,  have you ever considered a rating system that could somehow
measure strength in more than 1 dimension?   I suppose a huge problem
would be getting statistically reasonable samples,  but it would be
pretty cool if you could detect the rocks, scissors, paper effect and
given enough data be able to predict performance against particular
styles of players.  That would require a system that used more than just
a single value to specify playing ability.    It might be that if you
assumed just 2 or 3 dimensions instead of 1, it could be a major
improvement.   Perhaps most players make use of 2 or 3 major cognitive
abilities to play go?   I know in chess I encountered superior players
who could not calculate as well as I could, but they were clearly better
players than I.   And there were others just the opposite, seemed to
know nothing about chess but were so tactically tenacious that they were
difficult to beat. 

- Don





> 
> Rémi
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to