Re: [Computer-go] Shodan Go Bet

Raymond Wold Sat, 24 Jul 2010 02:22:15 -0700

On 24.07.2010 00:24, Don Dailey wrote:

On Fri, Jul 23, 2010 at 3:54 PM, Raymond Wold<[email protected] <mailto:[email protected]>> wrote:


    On 23.07.2010 02:12, Stefan Kaitschick wrote:

        Why should the worst case be the most interesting?
        In a program of this complexity worst case isn't the "true"
        strength of the program.
        Worst case is basically a bug.


    Given enough time, even an "AI" that chooses entirely randomly
    from the legal moves available will get a win against the best human.

I'm trying to follow this conversation, but I don't get your point onthis.

It was a point about ignoring poor performance and only looking at thebest results, which can justify pretty much anything as long as there issome variation in the performance.

        What's wrong with looking average play to judge the program?


    Well, it depends on how you measure the average. The typical,
    putting your bot on a go server and letting it play self-selected
    humans, is not very good, as surprise at playing style, no
    knowledge of the program's fundamental flaws, and so on, (not to
    mention humans not necessarily taking a game against a bot as
    serious) will bias it towards the program.
Playing strength is a function of everything you know and don't knowand all your strengths and weaknesses and how you put them together.Every player, even the very best has weaknesses but those do notdetermine how strong he is.

These two sentences contradict eachother to me. If a program plays as a9dan professional as long as there are never any ladders, but has a bugthat flips the status of every ladder, then that weakness /does/determine how strong he is. As incredible as it would be to make such aprogram, it would not deserve to be classed along with the human 9danprofessionals, it would probably be more correct to consider it doubledigit kyu, as anyone who knows about ladders could make one, or several,and practically seal any game once they know of the flaw. It 's entirelyconceivable that such a fault could go undetected at least for a fewgames, or that the program could get wins against professionals thatdidn't know about it.


I think your cognitive mental model of how this works is broken.

I didn't mention any cognitive mental model. I am speaking of what Iconsider fair rank for computer programs.

I used to believe that a chain is as strong as it's weakest link -using this analogy to improve my tennis game. But a pro I wasfriends with taught me a pretty valuable lesson. He told me that myweaknesses do not have to define my game and he taught me how to beaggressive with my strengths and thus minimize the weaknesses. Hegave me several examples of top pro's that were far from the best atcertain things, but played in such a way that is was only a minorissue and they reached the very top ranks. Then I remembered thata chess master basically told me the same thing.

Well, it is fairly hard to make a computer program recognize and try tominimize its weaknesses. You, as the programmer, can recognize it andtry to mitigate it, and that will probably increase its rank even undermy terms (and would please me no end). As long as you truly minimize theeffect of the weaknesses, so it's not a matter of just delaying the timeit takes for a human playing a long series of games to discover theweaknesses and exploit them.

    I would not object to an average of, say, 100 games against one
    human opponent trying his best to win. With an even result under
    such a series I would certainly consider the program as strong as
    the human.
Within 2 or 3 games the human as learned 80% of what he needs to know.Since most players who will play your program have already playedmany others that have the same basic characteristics, my conclusionis that computer strength as measured by ELO in KGS games is accuratebecause they have already taken the hit for this particular weakness.

With "simple" bugs like not reading ladders, or flaws in life-and-deathanalysis or patterns, I would agree. I am not however convinced thattodays go programs do not have more subtle weaknesses when it comes toplaying style, that it will take more than a few games to determine.

Your idea that there is some kind of break even point around game 100is completely ludicrous.

I am not sure what you are referring to here. My point about even resultafter a hundred games was in the sense of two players playing a hundredgames (or a thousand, the number isn't the point,) against eachother,and each player winning around half, counting as a more trustworthymeasure of them being equal in strength when one of the players is aprogram.

Basically your idea of fair is that the first few games shouldn'tcount - you just said it differently and it's a ridiculous idea.


That is indeed what I am saying, and I don't think it is so ridiculous.

I cannot tell you how often I have played some opponent that I couldnot beat the first few times until I leaned how to play him (in chess)and it has also happened just the opposite for me where I seemed towin easily but it was clear that my opponent was studying andlearning. Your comments suggest the first few games are invalid inANY encounter.

A human can much easier detect and correct his flaws, especially whenthey are being exploited. Thus the effect of weaknesses isn't asimportant for humans.

        And in terms of "interesting" I must say that I find the
        programs best play much more interesting than it's worst play.
        With best play I don't mean some book play ofcourse, but a
        fine solution to a tricky problem.


    "Tricky problems" is what a computer does best, a localized search
    for a solution, possibly even brute forced. This isn't very
    impressive to me.
You are clearly anti-computer and your comments seem to reflect a kindof emotional prejudice instead of logic - for instance thinking thatit's fair to set up matches in such a way that the computersweaknesses are exaggerated even more. The computers inability tolean is already a handicap right from the first move.

If I am anti-anything, it would be against bias in program authors andtesters. I am for intellectual honesty. If you think a program should becompensated in its ranking for the handicap that it can't learn, youshould give the program a much higher ranking than it would get on a goserver. After all, a 1dan amateur human player may one day qualify as aprofessional, or even win a professional title, thanks to his ability tolearn.

For what it's worth, I would want my own program to be tested to thesestandards of mine as well, and I would play it many games trying to findand exploit flaws, and seek other players willing to test it in the sameway, regardless what impact it would have on anyones thoughts on itsrank. Not to put down the work I'd put into it, but so that I couldimprove it.



        Granted, truly awesome play is currently mostly to be seen on 9*9.
        But I've seen some great kills on the big board that any top
        amateur could be proud of.


    And how do you deal with confirmation bias? If you look for
    exceptional results, do you also look for spectacular failures?
    What about if a program gets an occasional brilliant win, but
    still loses most of the games?

There is a system that averages the two, it's called the ELO system,or if you prefer the Go ranking system. The spectacular failureswill be reflected in the numbers.

You comment on this a bit out of context, let me try to get it back ontrack; does the ELO system, or any other ranking system, give anyprogram a rank on the big board that any top amateur could be proud of?I was merely commenting on the apparent confirmation bias.

You remind me of the man who ties someones hands behind his back andthen fights him. When the handicapped man bites you, you complainthat he is not fighting fair.
If you set up the match right, you can give either player an advantagebut I for one would seriously not trust the results of suchmanipulation. I think you need some kind ofreasonable justification for setting up match conditions such that oneplayer has every possible advantage.
So I have a suggestion. Let's play the match at the rate of game in1 minute. I hold the human to higher standards and it doesn't seemfair to me to set time controls in such as way that the human has suchan easy game of it. That's just not fair and it inflates the ELO ofthe human player. Your counter argument better not be that this ishe accepted time control for human players - of course it is.

I am not suggesting these unfair tests because I want the human to win,at least not directly. I wish you would give me enough credit to lookdeeper than that.

What I want is to be convinced that there /isn't/ any bias in theprogram's favour. That playing a hundred games /will/ give the sameresult as playing five. I want a program to meet this standard of mine.

Either you think that some of todays programs can do this, in which casemy suggestion isn't unfair at all! Or you think all programs do haveflaws that can be learned and exploited by humans, and you merely have adifferent standard than me on what you would consider a good go player.This is perfectly OK. You don't have to be offended that we disagree onour standards. If you still are, you can write it off as me justsuffering from perfectionism (which I admittedly do). Tradition hasworked out many ranking systems that works just fine, are counted asvery fair, for humans, and I do not think it unreasonable to think acomputer program should only be measured under these systems. I simplyhave my own standards that takes into account at least one additionalfactor that may come into play with programs that don't learn the wayhumans do.

I want programs to improve under /my/ standard. I am kind of hopingother go coders feel the same. Isn't handling the hard problem ofplaying go what this is all about, and not just getting a high ratingfor kudos and commercial gain? Surely tackling what your program isworst at does this the best?

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Shodan Go Bet

Reply via email to