On Thu, 2008-08-28 at 00:45 +0200, Rémi Coulom wrote:
> Don Dailey wrote:
> > On Wed, 2008-08-27 at 14:56 -0700, Bob Hearn wrote:
> >   
> >> The MoGo team has said that MoGo wins 62% of its games against a  
> >> baseline version when the processing power doubles. That's about
> >> half  
> >> a stone (if you assume you can generalize to human opponents).
> >>     
> >
> > Yes, I believe it does generalize on average.   
> >
> > This data matches my 13x13 study pretty closely,  about 62% give or take
> > for each doubling.     That is about 90 ELO or so.   I have heard that
> > 100 ELO is 1 stone which is what I was basing this on.   But it's not
> > clear to me at all if that is true.   So I can only guess that 4x in
> > Mogo is worth something like 1 or 2 stones or something between.    
> >
> > - Don
> 
> According to my experience with Go data, it is not possible to give the 
> value of one stone in terms of Elo ratings. For weak players, one stone 
> is a lot less than 100 Elo. For stronger players, it may be more.
> 
> Also, it is very important to understand that the Elo model is very 
> wrong, and Elo against humans has nothing to do with Elo against 
> computers (and even less with Elo against the previous version). In 
> games against GNU Go, Crazy Stone improved 200-300 Elo points in one 
> year. On KGS, this translated into an improvement from 2k to 1k.

If we accept that 200 ELO is one stone as has been asserted,  then this
is not that far off - also keeping in mind that KGS ratings are very
grainy.  Does KGS report fractional k?  That 2k and 1k could be almost 2
stones apart unless it is 2.0 k and 1.0 k.   I don't see this as any
kind of solid evidence of anything.   Also, humans also tend to learn
from playing given opponents where computers do not have that luxury and
the improved program and it's longer exposure could have been seen as an
interesting challenge to the humans playing it - and humans tend to rise
to a challenge.  

I agree with you and accept that you cannot convert stones to ELO as the
scale is much different between weak and strong players.  I think that
is pretty clear and there is no reason to think it would convert in a
straightforward way anyway.   Even in chess a given advantage (such as a
pawn) means a lot more as you get stronger.   So I guess it's proper to
specify which rank you are talking about when speculating about
elo/stone equivalence.

I believe that there can be weak intransitivity in go.  It probably
applies as much to specific opponents as it does humans vs computers
because no 2 humans or computers play the same.   But you apparently
believe it's extremely strong.   I honestly don't see any evidence
either way and for me Occam's razor applies here.   I could be convinced
with really strong statistical evidence, but I don't think anybody has
ever presented any.   

I do believe in intransitives,  but I believe it is a local phenomenon.
You might have two 3 dan players and one of them wins far more that he
"should" against the other.   But if the losing player gets better and
increases to 5 dan,  it doesn't imply that the 3 dan player will
continue to score better against him than other 5 dan players.
Presumably, in the process of improving to 5d  he addressed his
weaknesses (which more than likely were holding him back) more than his
strengths and whatever the 3 dan player was beating him with, isn't
working any longer.  

So I believe transitivity "catches up" or averages out.  If you improve
a specific program 100 ELO against other specific programs,  it may not
fully translate to 100 ELO against EVERY program or human,  it could be
perhaps 50 ELO against the average human opponent.    But if you do this
again, then again, then again and so on,  you are not going to keep
losing 50 ELO each time relative to humans.  Again, I'm going with Occam
on this unless someone can show me differently. 
 
The computer chess Rybka team believe their program has a weakness
against Zappa and indeed it appears statistically that Zappa does better
than it should relative to other programs (although Rybka is still
superior.)   This might be a minor intransitivity.  I don't think it
follows that this intransivity will always exist in the same amounts as
you scale up the programs to faster hardware (of course it's possible
and even likely that some programs will SCALE BETTER,  but that's not
what I'm talking about here, I'm talking about a program that scales
only against specific opponents and not against others.)   

I would like to see proof (even empirically) that this is a problem with
computer go,  not just speculation, superstition or anecdotes.  

I don't really believe the ELO model is "very wrong."   I only believe
it is a mathematical model that is "somewhat" flawed for chess and
presumable also for other games.   Do you have an alternative that might
be more accurate?   


- Don





> 
> Rémi
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to