Re: [Computer-go] Shodan Go Bet

Raymond Wold Sat, 24 Jul 2010 09:47:21 -0700

On 24.07.2010 15:13, Gian-Carlo Pascutto wrote:

Raymond Wold wrote:

Playing strength is a function of everything you know and don't know
and all your strengths and weaknesses and how you put them together.
   Every player, even the very best has weaknesses but those do not
determine how strong he is.

These two sentences contradict eachother to me. If a program plays as a
9dan professional as long as there are never any ladders, but has a bug
that flips the status of every ladder, then that weakness /does/
determine how strong he is. As incredible as it would be to make such a
program, it would not deserve to be classed along with the human 9dan
professionals, it would probably be more correct to consider it double
digit kyu,

This makes no sense. The rating of the program is the strenght given
this weakness. If this turns out to be 9dan pro despite laddder
misreading, for example because it turns out to be very hard to set up
game-result-defining ladders, the program is 9dan pro.

I was simply not clear enough. I meant a program that, if you took allthe games where no ladders appeared, and based a rank on that(cherry-picking the games where the flaw in the program never shows),would be counted as 9dan pro. Of course the program would not be 9danpro in any reasonable sense, and that was my point, since all the gameswith ladders would ruin its statistics. The weakness would determine itsstrength.

There is a tendency for humans to rate computers according to the flaws
they see they can understand. If a 10kyu sees a program making a mistake
he understands and could avoid, he will think the program is worse than
10kyu. You are also falling into this trap by giving that example.

I should probably never have mentioned my goal of eliminating bias andpromoting intellectual honesty, now everyone takes that as an attack onthem personally and try their hardest to find such in my arguments. Imean, I don't mind people finding my biases so I can counteract them,but this will just lead to people reading things where there are none,like you seem to have done.

Basically your idea of fair is that the first few games shouldn't
count - you just said it differently and it's a ridiculous idea.

That is indeed what I am saying, and I don't think it is so ridiculous.

You are saying that a known weakness of human players is that they need
warmup time and that they should not be rated according to their weakest
performance, which is coincidentally what you are arguing *against*.

No, I am saying that a humans weaknesses isn't very important, since thehuman will notice and learn to avoid them in very few games where one isexploited. Without gaining significant rank in the proccess. A programhas no such benefit, and once a version of a program has a flaw, thatflaw remains there until a new versions is made attempting to fix theflaw. Any claim about the program's strength will be undermined by theset of people who know of its flaws. Any challenge demonstrating theprogram's skill (such as John Tromp's bet) can be called into questionin either direction, with speculation on whether the opponent knows ofthe flaw(s) or not. The game turns into something not go, but rather "doyou know this program or not?" I am more interested in the game of go.

A human can much easier detect and correct his flaws, especially when
they are being exploited. Thus the effect of weaknesses isn't as
important for humans.

A weakness of human chess players against computers is that they don't
perform well in very fast timecontrols, which lead GMs to lose to
computers even when the latter were far behind in playing strength on
slow timecontrols.

I'm curious what you suggest to fix this human weaknesses.

Another weakness is that they have problems reliably visualizing

positions 20 ply out and identifying all the tactical possibilities
there, and backtracking that to the current position. The nasty
computers exploit this in almost every game.

I'm also curious how you suggest to adapt to that.

I was talking about weaknesses that could be exploited by players with arank lower than the "ordinary rank" of the opponent in the game inquestion. Whether that be move-faster-than-human-motor-control blitzchess, actual chess at reasonable timing, or go. A go program ranked 1dan on KGS for instance, should not have flaws that a 4 kyu can reliablyexploit to win every even game if its author(s) wants to claim a true 1dan playing strength for it.

If I am anti-anything, it would be against bias in program authors and
testers. I am for intellectual honesty. If you think a program should be
compensated in its ranking for the handicap that it can't learn, you
should give the program a much higher ranking than it would get on a go
server.

As already explained, this argument works perfectly well the other way
around (warmup time for humans and you wanting to drop the first games).
If you think you are unbiased or intellectually honest when making such
an argument, you're fooling yourself.

I would think that two humans ranked the same would, over many games,get an even result. I would not mind putting this to the test with myown ranking. Another human can play me in a hundred games were we bothdo our best, and even if we disregard those first hundred games, I willnot have exposed any weaknesses that can be exploited by an evenopponent. People don't do this when playing go simply because they knowthat learning the weaknesses of the opponent isn't a viable strategy -they will just learn yours in return, and fix their own. A computerprogram is unable to do this. Thus the difference.

I want programs to improve under /my/ standard. I am kind of hoping
other go coders feel the same. Isn't handling the hard problem of
playing go what this is all about, and not just getting a high rating
for kudos and commercial gain? Surely tackling what your program is
worst at does this the best?

Handling the hard problem of go means maximizing playing strength. That
*is* my interest, and this does *NOT* entail fixing every possible
weakness. Practise has demonstrated this convincingly for Go, for chess,
and for other games. The strength of a program is *NOT* solely
determined by its weakest part.

So you are saying that I am wrong in that a lower-ranked human than aprogram can learn its flaws over a lot of games and reliably beat it,without having gained correspondingly in skill against other humans? Orare you saying that this does not matter, is entirely irrelevant to afair judgement of skill?

Your "commercial gain" argument is very lame and silly. If this were the
interest, fixing the weaknesses would be more important than making the
program play well. To understand why, see the second paragraph of this mail.

So if you are marketing your go program for the purpose of a learningaid, for instance, it makes no sense to want to cover up that aftertwenty games you will know how to beat it soundly, having learned thewrong lesson (how to beat that specific program, rather than gettingbetter at go)? If a customer is browsing for programs by skill, it makesno sense to use a go rating from a popular server where people don'tplay your program repeatedly, instead of the rating of the players thatcan reliably beat it after some practice?

I don't know that this happens (not having tried any of the commercialprograms myself (authors are welcome to donate me copies to have me tryout and give it a rating *I* think it deserves)), but I don't think itsounds lame and silly.


On 24.07.2010 15:13, Don Dailey wrote:

I don't want this to get into a big debate, but here is what it comesdown to in my opinion. You are trying to isolate the learningprocess from some hypothetical state where "learning is complete."This is impossible to do and it's not what the human state is allabout. We constantly learn and you cannot isolate the two things.

No, I am not.

There are players who start young and improve during their entirelives, and when they lose we would consider it pretty strange if theysaid this game shouldn't count because they are still "learning."Wouldn't you consider that intellectually dishonest on their part?

If two players play a hundred games where the result turns out even, theplayers starting out as 4 kyu, and ending out as 2 kyu by the last game,they would still be even. If only one of them has a reliable rank, Iwould still think it honest of the other player to claim the 2 kyu rankat the end.

If a human player plays a hundred games against a go program, where thehuman progresses from 4 kyu to 2 kyu over the games, and the human atthe end have dominated the go program , I would not think it honest ofthe authors of the go program to claim that it has a 2 kyu strength orhigher.

Normal growth and learning is not the issue - the issue is flaws thatcan be learned /without/ significant advancement in rank.

That's why I consider this idea ill-conceived. It's rather like afoot race where you let the slow starter have a runningstart because you consider it unfair that he get penalized for being aslow starter. Whatever you think you are measuring, it's not a fairrace.
Tournament play is what it is and if the conditions are not the samefor every player then it's inherently unfair. I think it's prettysilly to consider games invalid because you believe you have not yetlearned enough about the opponent to take full advantage of hisignorance. That is what I consider intellectually dishonest (toborrow your own phrase.)
For some reason I think of Las Vegas, another rigged game. I do notgamble as I consider a form of ugly greed but I know that the gamesare rigged so that you cannot win. And if someone actually developsthe skill to win at something, such as by counting cards - they willkick you out. It's a bizarre situation where you are allowed toplay as long as you are not very good at it. This is rather likethat, let the human keep playing the computer until he figures outhow to beat it, then start rating the games. And of course if thecomputer wins anyway, the human just stops playing it. It soundslike an honest rating system to me.

But I am not talking about a human progressing in ranking until he isranked higher than the claimed ranking of the program. I am talkingabout weaknesses he can exploit long before he learns to beat actualhuman players of that ranking. If a human player loses significantly ina hundred-game match against a program, I would not object very hard togiving the program even the rank of the player at the end of the series,regardless of whether he's progressed in skill. If John Tromp knows inadvance which program he will play, and practices up to the matchagainst it, trying his best to learn its flaws, and the program stillwins, I would have no problem admitting that the program is at least 2or 3 dan (or whatever rank John Tromp has at the time of playing)./That/ would be an interesting result.

As I said, I do not think you are unreasonable for wanting the exactsame conditions for every player. I see the logic. Equality is one nicemeasure of fairness. I just have a different standard for judgingprograms, because of the nature of those programs, which is verydifferent from the nature of humans.

I'm not sure it's relevant, but an interesting thought experiment mightbe to consider how you would feel about a go program with a huge libraryof trick plays, that it employed whenever it thought it was behind. Or ago program that tried its best to get the opponent to lose on time.Given that tournament play is the same for everyone, would you still notfeel that such a program's rating would be even a little undeserved?

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Shodan Go Bet

Reply via email to