On Thu, 2008-08-28 at 00:45 +0200, Rémi Coulom wrote: > Don Dailey wrote: > > On Wed, 2008-08-27 at 14:56 -0700, Bob Hearn wrote: > > > >> The MoGo team has said that MoGo wins 62% of its games against a > >> baseline version when the processing power doubles. That's about > >> half > >> a stone (if you assume you can generalize to human opponents). > >> > > > > Yes, I believe it does generalize on average. > > > > This data matches my 13x13 study pretty closely, about 62% give or take > > for each doubling. That is about 90 ELO or so. I have heard that > > 100 ELO is 1 stone which is what I was basing this on. But it's not > > clear to me at all if that is true. So I can only guess that 4x in > > Mogo is worth something like 1 or 2 stones or something between. > > > > - Don > > According to my experience with Go data, it is not possible to give the > value of one stone in terms of Elo ratings. For weak players, one stone > is a lot less than 100 Elo. For stronger players, it may be more. > > Also, it is very important to understand that the Elo model is very > wrong, and Elo against humans has nothing to do with Elo against > computers (and even less with Elo against the previous version). In > games against GNU Go, Crazy Stone improved 200-300 Elo points in one > year. On KGS, this translated into an improvement from 2k to 1k.
If we accept that 200 ELO is one stone as has been asserted, then this is not that far off - also keeping in mind that KGS ratings are very grainy. Does KGS report fractional k? That 2k and 1k could be almost 2 stones apart unless it is 2.0 k and 1.0 k. I don't see this as any kind of solid evidence of anything. Also, humans also tend to learn from playing given opponents where computers do not have that luxury and the improved program and it's longer exposure could have been seen as an interesting challenge to the humans playing it - and humans tend to rise to a challenge. I agree with you and accept that you cannot convert stones to ELO as the scale is much different between weak and strong players. I think that is pretty clear and there is no reason to think it would convert in a straightforward way anyway. Even in chess a given advantage (such as a pawn) means a lot more as you get stronger. So I guess it's proper to specify which rank you are talking about when speculating about elo/stone equivalence. I believe that there can be weak intransitivity in go. It probably applies as much to specific opponents as it does humans vs computers because no 2 humans or computers play the same. But you apparently believe it's extremely strong. I honestly don't see any evidence either way and for me Occam's razor applies here. I could be convinced with really strong statistical evidence, but I don't think anybody has ever presented any. I do believe in intransitives, but I believe it is a local phenomenon. You might have two 3 dan players and one of them wins far more that he "should" against the other. But if the losing player gets better and increases to 5 dan, it doesn't imply that the 3 dan player will continue to score better against him than other 5 dan players. Presumably, in the process of improving to 5d he addressed his weaknesses (which more than likely were holding him back) more than his strengths and whatever the 3 dan player was beating him with, isn't working any longer. So I believe transitivity "catches up" or averages out. If you improve a specific program 100 ELO against other specific programs, it may not fully translate to 100 ELO against EVERY program or human, it could be perhaps 50 ELO against the average human opponent. But if you do this again, then again, then again and so on, you are not going to keep losing 50 ELO each time relative to humans. Again, I'm going with Occam on this unless someone can show me differently. The computer chess Rybka team believe their program has a weakness against Zappa and indeed it appears statistically that Zappa does better than it should relative to other programs (although Rybka is still superior.) This might be a minor intransitivity. I don't think it follows that this intransivity will always exist in the same amounts as you scale up the programs to faster hardware (of course it's possible and even likely that some programs will SCALE BETTER, but that's not what I'm talking about here, I'm talking about a program that scales only against specific opponents and not against others.) I would like to see proof (even empirically) that this is a problem with computer go, not just speculation, superstition or anecdotes. I don't really believe the ELO model is "very wrong." I only believe it is a mathematical model that is "somewhat" flawed for chess and presumable also for other games. Do you have an alternative that might be more accurate? - Don > > Rémi > _______________________________________________ > computer-go mailing list > [email protected] > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
