On 24.07.2010 15:13, Gian-Carlo Pascutto wrote:
Raymond Wold wrote:
Playing strength is a function of everything you know and don't know
and all your strengths and weaknesses and how you put them together.
Every player, even the very best has weaknesses but those do not
determine how strong he is.
These two sentences contradict eachother to me. If a program plays as a
9dan professional as long as there are never any ladders, but has a bug
that flips the status of every ladder, then that weakness /does/
determine how strong he is. As incredible as it would be to make such a
program, it would not deserve to be classed along with the human 9dan
professionals, it would probably be more correct to consider it double
digit kyu,
This makes no sense. The rating of the program is the strenght given
this weakness. If this turns out to be 9dan pro despite laddder
misreading, for example because it turns out to be very hard to set up
game-result-defining ladders, the program is 9dan pro.
I was simply not clear enough. I meant a program that, if you took all
the games where no ladders appeared, and based a rank on that
(cherry-picking the games where the flaw in the program never shows),
would be counted as 9dan pro. Of course the program would not be 9dan
pro in any reasonable sense, and that was my point, since all the games
with ladders would ruin its statistics. The weakness would determine its
strength.
There is a tendency for humans to rate computers according to the flaws
they see they can understand. If a 10kyu sees a program making a mistake
he understands and could avoid, he will think the program is worse than
10kyu. You are also falling into this trap by giving that example.
I should probably never have mentioned my goal of eliminating bias and
promoting intellectual honesty, now everyone takes that as an attack on
them personally and try their hardest to find such in my arguments. I
mean, I don't mind people finding my biases so I can counteract them,
but this will just lead to people reading things where there are none,
like you seem to have done.
Basically your idea of fair is that the first few games shouldn't
count - you just said it differently and it's a ridiculous idea.
That is indeed what I am saying, and I don't think it is so ridiculous.
You are saying that a known weakness of human players is that they need
warmup time and that they should not be rated according to their weakest
performance, which is coincidentally what you are arguing *against*.
No, I am saying that a humans weaknesses isn't very important, since the
human will notice and learn to avoid them in very few games where one is
exploited. Without gaining significant rank in the proccess. A program
has no such benefit, and once a version of a program has a flaw, that
flaw remains there until a new versions is made attempting to fix the
flaw. Any claim about the program's strength will be undermined by the
set of people who know of its flaws. Any challenge demonstrating the
program's skill (such as John Tromp's bet) can be called into question
in either direction, with speculation on whether the opponent knows of
the flaw(s) or not. The game turns into something not go, but rather "do
you know this program or not?" I am more interested in the game of go.
A human can much easier detect and correct his flaws, especially when
they are being exploited. Thus the effect of weaknesses isn't as
important for humans.
A weakness of human chess players against computers is that they don't
perform well in very fast timecontrols, which lead GMs to lose to
computers even when the latter were far behind in playing strength on
slow timecontrols.
I'm curious what you suggest to fix this human weaknesses.
Another weakness is that they have problems reliably visualizing
positions 20 ply out and identifying all the tactical possibilities
there, and backtracking that to the current position. The nasty
computers exploit this in almost every game.
I'm also curious how you suggest to adapt to that.
I was talking about weaknesses that could be exploited by players with a
rank lower than the "ordinary rank" of the opponent in the game in
question. Whether that be move-faster-than-human-motor-control blitz
chess, actual chess at reasonable timing, or go. A go program ranked 1
dan on KGS for instance, should not have flaws that a 4 kyu can reliably
exploit to win every even game if its author(s) wants to claim a true 1
dan playing strength for it.
If I am anti-anything, it would be against bias in program authors and
testers. I am for intellectual honesty. If you think a program should be
compensated in its ranking for the handicap that it can't learn, you
should give the program a much higher ranking than it would get on a go
server.
As already explained, this argument works perfectly well the other way
around (warmup time for humans and you wanting to drop the first games).
If you think you are unbiased or intellectually honest when making such
an argument, you're fooling yourself.
I would think that two humans ranked the same would, over many games,
get an even result. I would not mind putting this to the test with my
own ranking. Another human can play me in a hundred games were we both
do our best, and even if we disregard those first hundred games, I will
not have exposed any weaknesses that can be exploited by an even
opponent. People don't do this when playing go simply because they know
that learning the weaknesses of the opponent isn't a viable strategy -
they will just learn yours in return, and fix their own. A computer
program is unable to do this. Thus the difference.
I want programs to improve under /my/ standard. I am kind of hoping
other go coders feel the same. Isn't handling the hard problem of
playing go what this is all about, and not just getting a high rating
for kudos and commercial gain? Surely tackling what your program is
worst at does this the best?
Handling the hard problem of go means maximizing playing strength. That
*is* my interest, and this does *NOT* entail fixing every possible
weakness. Practise has demonstrated this convincingly for Go, for chess,
and for other games. The strength of a program is *NOT* solely
determined by its weakest part.
So you are saying that I am wrong in that a lower-ranked human than a
program can learn its flaws over a lot of games and reliably beat it,
without having gained correspondingly in skill against other humans? Or
are you saying that this does not matter, is entirely irrelevant to a
fair judgement of skill?
Your "commercial gain" argument is very lame and silly. If this were the
interest, fixing the weaknesses would be more important than making the
program play well. To understand why, see the second paragraph of this mail.
So if you are marketing your go program for the purpose of a learning
aid, for instance, it makes no sense to want to cover up that after
twenty games you will know how to beat it soundly, having learned the
wrong lesson (how to beat that specific program, rather than getting
better at go)? If a customer is browsing for programs by skill, it makes
no sense to use a go rating from a popular server where people don't
play your program repeatedly, instead of the rating of the players that
can reliably beat it after some practice?
I don't know that this happens (not having tried any of the commercial
programs myself (authors are welcome to donate me copies to have me try
out and give it a rating *I* think it deserves)), but I don't think it
sounds lame and silly.
On 24.07.2010 15:13, Don Dailey wrote:
I don't want this to get into a big debate, but here is what it comes
down to in my opinion. You are trying to isolate the learning
process from some hypothetical state where "learning is complete."
This is impossible to do and it's not what the human state is all
about. We constantly learn and you cannot isolate the two things.
No, I am not.
There are players who start young and improve during their entire
lives, and when they lose we would consider it pretty strange if they
said this game shouldn't count because they are still "learning."
Wouldn't you consider that intellectually dishonest on their part?
If two players play a hundred games where the result turns out even, the
players starting out as 4 kyu, and ending out as 2 kyu by the last game,
they would still be even. If only one of them has a reliable rank, I
would still think it honest of the other player to claim the 2 kyu rank
at the end.
If a human player plays a hundred games against a go program, where the
human progresses from 4 kyu to 2 kyu over the games, and the human at
the end have dominated the go program , I would not think it honest of
the authors of the go program to claim that it has a 2 kyu strength or
higher.
Normal growth and learning is not the issue - the issue is flaws that
can be learned /without/ significant advancement in rank.
That's why I consider this idea ill-conceived. It's rather like a
foot race where you let the slow starter have a running
start because you consider it unfair that he get penalized for being a
slow starter. Whatever you think you are measuring, it's not a fair
race.
Tournament play is what it is and if the conditions are not the same
for every player then it's inherently unfair. I think it's pretty
silly to consider games invalid because you believe you have not yet
learned enough about the opponent to take full advantage of his
ignorance. That is what I consider intellectually dishonest (to
borrow your own phrase.)
For some reason I think of Las Vegas, another rigged game. I do not
gamble as I consider a form of ugly greed but I know that the games
are rigged so that you cannot win. And if someone actually develops
the skill to win at something, such as by counting cards - they will
kick you out. It's a bizarre situation where you are allowed to
play as long as you are not very good at it. This is rather like
that, let the human keep playing the computer until he figures out
how to beat it, then start rating the games. And of course if the
computer wins anyway, the human just stops playing it. It sounds
like an honest rating system to me.
But I am not talking about a human progressing in ranking until he is
ranked higher than the claimed ranking of the program. I am talking
about weaknesses he can exploit long before he learns to beat actual
human players of that ranking. If a human player loses significantly in
a hundred-game match against a program, I would not object very hard to
giving the program even the rank of the player at the end of the series,
regardless of whether he's progressed in skill. If John Tromp knows in
advance which program he will play, and practices up to the match
against it, trying his best to learn its flaws, and the program still
wins, I would have no problem admitting that the program is at least 2
or 3 dan (or whatever rank John Tromp has at the time of playing).
/That/ would be an interesting result.
As I said, I do not think you are unreasonable for wanting the exact
same conditions for every player. I see the logic. Equality is one nice
measure of fairness. I just have a different standard for judging
programs, because of the nature of those programs, which is very
different from the nature of humans.
I'm not sure it's relevant, but an interesting thought experiment might
be to consider how you would feel about a go program with a huge library
of trick plays, that it employed whenever it thought it was behind. Or a
go program that tried its best to get the opponent to lose on time.
Given that tournament play is the same for everyone, would you still not
feel that such a program's rating would be even a little undeserved?
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go