On 24.07.2010 15:13, Gian-Carlo Pascutto wrote:
Raymond Wold wrote:

Playing strength is a function of everything you know and don't know
and all your strengths and weaknesses and how you put them together.
   Every player, even the very best has weaknesses but those do not
determine how strong he is.
These two sentences contradict eachother to me. If a program plays as a
9dan professional as long as there are never any ladders, but has a bug
that flips the status of every ladder, then that weakness /does/
determine how strong he is. As incredible as it would be to make such a
program, it would not deserve to be classed along with the human 9dan
professionals, it would probably be more correct to consider it double
digit kyu,
This makes no sense. The rating of the program is the strenght given
this weakness. If this turns out to be 9dan pro despite laddder
misreading, for example because it turns out to be very hard to set up
game-result-defining ladders, the program is 9dan pro.


I was simply not clear enough. I meant a program that, if you took all the games where no ladders appeared, and based a rank on that (cherry-picking the games where the flaw in the program never shows), would be counted as 9dan pro. Of course the program would not be 9dan pro in any reasonable sense, and that was my point, since all the games with ladders would ruin its statistics. The weakness would determine its strength.

There is a tendency for humans to rate computers according to the flaws
they see they can understand. If a 10kyu sees a program making a mistake
he understands and could avoid, he will think the program is worse than
10kyu. You are also falling into this trap by giving that example.

I should probably never have mentioned my goal of eliminating bias and promoting intellectual honesty, now everyone takes that as an attack on them personally and try their hardest to find such in my arguments. I mean, I don't mind people finding my biases so I can counteract them, but this will just lead to people reading things where there are none, like you seem to have done.

Basically your idea of fair is that the first few games shouldn't
count - you just said it differently and it's a ridiculous idea.
That is indeed what I am saying, and I don't think it is so ridiculous.
You are saying that a known weakness of human players is that they need
warmup time and that they should not be rated according to their weakest
performance, which is coincidentally what you are arguing *against*.


No, I am saying that a humans weaknesses isn't very important, since the human will notice and learn to avoid them in very few games where one is exploited. Without gaining significant rank in the proccess. A program has no such benefit, and once a version of a program has a flaw, that flaw remains there until a new versions is made attempting to fix the flaw. Any claim about the program's strength will be undermined by the set of people who know of its flaws. Any challenge demonstrating the program's skill (such as John Tromp's bet) can be called into question in either direction, with speculation on whether the opponent knows of the flaw(s) or not. The game turns into something not go, but rather "do you know this program or not?" I am more interested in the game of go.

A human can much easier detect and correct his flaws, especially when
they are being exploited. Thus the effect of weaknesses isn't as
important for humans.
A weakness of human chess players against computers is that they don't
perform well in very fast timecontrols, which lead GMs to lose to
computers even when the latter were far behind in playing strength on
slow timecontrols.

I'm curious what you suggest to fix this human weaknesses.
Another weakness is that they have problems reliably visualizing
positions 20 ply out and identifying all the tactical possibilities
there, and backtracking that to the current position. The nasty
computers exploit this in almost every game.

I'm also curious how you suggest to adapt to that.

I was talking about weaknesses that could be exploited by players with a rank lower than the "ordinary rank" of the opponent in the game in question. Whether that be move-faster-than-human-motor-control blitz chess, actual chess at reasonable timing, or go. A go program ranked 1 dan on KGS for instance, should not have flaws that a 4 kyu can reliably exploit to win every even game if its author(s) wants to claim a true 1 dan playing strength for it.

If I am anti-anything, it would be against bias in program authors and
testers. I am for intellectual honesty. If you think a program should be
compensated in its ranking for the handicap that it can't learn, you
should give the program a much higher ranking than it would get on a go
server.
As already explained, this argument works perfectly well the other way
around (warmup time for humans and you wanting to drop the first games).
If you think you are unbiased or intellectually honest when making such
an argument, you're fooling yourself.


I would think that two humans ranked the same would, over many games, get an even result. I would not mind putting this to the test with my own ranking. Another human can play me in a hundred games were we both do our best, and even if we disregard those first hundred games, I will not have exposed any weaknesses that can be exploited by an even opponent. People don't do this when playing go simply because they know that learning the weaknesses of the opponent isn't a viable strategy - they will just learn yours in return, and fix their own. A computer program is unable to do this. Thus the difference.

I want programs to improve under /my/ standard. I am kind of hoping
other go coders feel the same. Isn't handling the hard problem of
playing go what this is all about, and not just getting a high rating
for kudos and commercial gain? Surely tackling what your program is
worst at does this the best?
Handling the hard problem of go means maximizing playing strength. That
*is* my interest, and this does *NOT* entail fixing every possible
weakness. Practise has demonstrated this convincingly for Go, for chess,
and for other games. The strength of a program is *NOT* solely
determined by its weakest part.

So you are saying that I am wrong in that a lower-ranked human than a program can learn its flaws over a lot of games and reliably beat it, without having gained correspondingly in skill against other humans? Or are you saying that this does not matter, is entirely irrelevant to a fair judgement of skill?

Your "commercial gain" argument is very lame and silly. If this were the
interest, fixing the weaknesses would be more important than making the
program play well. To understand why, see the second paragraph of this mail.

So if you are marketing your go program for the purpose of a learning aid, for instance, it makes no sense to want to cover up that after twenty games you will know how to beat it soundly, having learned the wrong lesson (how to beat that specific program, rather than getting better at go)? If a customer is browsing for programs by skill, it makes no sense to use a go rating from a popular server where people don't play your program repeatedly, instead of the rating of the players that can reliably beat it after some practice?

I don't know that this happens (not having tried any of the commercial programs myself (authors are welcome to donate me copies to have me try out and give it a rating *I* think it deserves)), but I don't think it sounds lame and silly.

On 24.07.2010 15:13, Don Dailey wrote:
I don't want this to get into a big debate, but here is what it comes down to in my opinion. You are trying to isolate the learning process from some hypothetical state where "learning is complete." This is impossible to do and it's not what the human state is all about. We constantly learn and you cannot isolate the two things.

No, I am not.
There are players who start young and improve during their entire lives, and when they lose we would consider it pretty strange if they said this game shouldn't count because they are still "learning." Wouldn't you consider that intellectually dishonest on their part?

If two players play a hundred games where the result turns out even, the players starting out as 4 kyu, and ending out as 2 kyu by the last game, they would still be even. If only one of them has a reliable rank, I would still think it honest of the other player to claim the 2 kyu rank at the end.

If a human player plays a hundred games against a go program, where the human progresses from 4 kyu to 2 kyu over the games, and the human at the end have dominated the go program , I would not think it honest of the authors of the go program to claim that it has a 2 kyu strength or higher.

Normal growth and learning is not the issue - the issue is flaws that can be learned /without/ significant advancement in rank.

That's why I consider this idea ill-conceived. It's rather like a foot race where you let the slow starter have a running start because you consider it unfair that he get penalized for being a slow starter. Whatever you think you are measuring, it's not a fair race.

Tournament play is what it is and if the conditions are not the same for every player then it's inherently unfair. I think it's pretty silly to consider games invalid because you believe you have not yet learned enough about the opponent to take full advantage of his ignorance. That is what I consider intellectually dishonest (to borrow your own phrase.)

For some reason I think of Las Vegas, another rigged game. I do not gamble as I consider a form of ugly greed but I know that the games are rigged so that you cannot win. And if someone actually develops the skill to win at something, such as by counting cards - they will kick you out. It's a bizarre situation where you are allowed to play as long as you are not very good at it. This is rather like that, let the human keep playing the computer until he figures out how to beat it, then start rating the games. And of course if the computer wins anyway, the human just stops playing it. It sounds like an honest rating system to me.


But I am not talking about a human progressing in ranking until he is ranked higher than the claimed ranking of the program. I am talking about weaknesses he can exploit long before he learns to beat actual human players of that ranking. If a human player loses significantly in a hundred-game match against a program, I would not object very hard to giving the program even the rank of the player at the end of the series, regardless of whether he's progressed in skill. If John Tromp knows in advance which program he will play, and practices up to the match against it, trying his best to learn its flaws, and the program still wins, I would have no problem admitting that the program is at least 2 or 3 dan (or whatever rank John Tromp has at the time of playing). /That/ would be an interesting result.

As I said, I do not think you are unreasonable for wanting the exact same conditions for every player. I see the logic. Equality is one nice measure of fairness. I just have a different standard for judging programs, because of the nature of those programs, which is very different from the nature of humans.


I'm not sure it's relevant, but an interesting thought experiment might be to consider how you would feel about a go program with a huge library of trick plays, that it employed whenever it thought it was behind. Or a go program that tried its best to get the opponent to lose on time. Given that tournament play is the same for everyone, would you still not feel that such a program's rating would be even a little undeserved?
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to