On 24.07.2010 00:24, Don Dailey wrote:


On Fri, Jul 23, 2010 at 3:54 PM, Raymond Wold <[email protected] <mailto:[email protected]>> wrote:

    On 23.07.2010 02:12, Stefan Kaitschick wrote:

        Why should the worst case be the most interesting?
        In a program of this complexity worst case isn't the "true"
        strength of the program.
        Worst case is basically a bug.


    Given enough time, even an "AI" that chooses entirely randomly
    from the legal moves available will get a win against the best human.


I'm trying to follow this conversation, but I don't get your point on this.

It was a point about ignoring poor performance and only looking at the best results, which can justify pretty much anything as long as there is some variation in the performance.




        What's wrong with looking average play to judge the program?


    Well, it depends on how you measure the average. The typical,
    putting your bot on a go server and letting it play self-selected
    humans, is not very good, as surprise at playing style, no
    knowledge of the program's fundamental flaws, and so on, (not to
    mention humans not necessarily taking a game against a bot as
    serious) will bias it towards the program.


Playing strength is a function of everything you know and don't know and all your strengths and weaknesses and how you put them together. Every player, even the very best has weaknesses but those do not determine how strong he is.

These two sentences contradict eachother to me. If a program plays as a 9dan professional as long as there are never any ladders, but has a bug that flips the status of every ladder, then that weakness /does/ determine how strong he is. As incredible as it would be to make such a program, it would not deserve to be classed along with the human 9dan professionals, it would probably be more correct to consider it double digit kyu, as anyone who knows about ladders could make one, or several, and practically seal any game once they know of the flaw. It 's entirely conceivable that such a fault could go undetected at least for a few games, or that the program could get wins against professionals that didn't know about it.


I think your cognitive mental model of how this works is broken.


I didn't mention any cognitive mental model. I am speaking of what I consider fair rank for computer programs.

I used to believe that a chain is as strong as it's weakest link - using this analogy to improve my tennis game. But a pro I was friends with taught me a pretty valuable lesson. He told me that my weaknesses do not have to define my game and he taught me how to be aggressive with my strengths and thus minimize the weaknesses. He gave me several examples of top pro's that were far from the best at certain things, but played in such a way that is was only a minor issue and they reached the very top ranks. Then I remembered that a chess master basically told me the same thing.

Well, it is fairly hard to make a computer program recognize and try to minimize its weaknesses. You, as the programmer, can recognize it and try to mitigate it, and that will probably increase its rank even under my terms (and would please me no end). As long as you truly minimize the effect of the weaknesses, so it's not a matter of just delaying the time it takes for a human playing a long series of games to discover the weaknesses and exploit them.



    I would not object to an average of, say, 100 games against one
    human opponent trying his best to win. With an even result under
    such a series I would certainly consider the program as strong as
    the human.


Within 2 or 3 games the human as learned 80% of what he needs to know. Since most players who will play your program have already played many others that have the same basic characteristics, my conclusion is that computer strength as measured by ELO in KGS games is accurate because they have already taken the hit for this particular weakness.


With "simple" bugs like not reading ladders, or flaws in life-and-death analysis or patterns, I would agree. I am not however convinced that todays go programs do not have more subtle weaknesses when it comes to playing style, that it will take more than a few games to determine.

Your idea that there is some kind of break even point around game 100 is completely ludicrous.


I am not sure what you are referring to here. My point about even result after a hundred games was in the sense of two players playing a hundred games (or a thousand, the number isn't the point,) against eachother, and each player winning around half, counting as a more trustworthy measure of them being equal in strength when one of the players is a program.

Basically your idea of fair is that the first few games shouldn't count - you just said it differently and it's a ridiculous idea.

That is indeed what I am saying, and I don't think it is so ridiculous.

I cannot tell you how often I have played some opponent that I could not beat the first few times until I leaned how to play him (in chess) and it has also happened just the opposite for me where I seemed to win easily but it was clear that my opponent was studying and learning. Your comments suggest the first few games are invalid in ANY encounter.

A human can much easier detect and correct his flaws, especially when they are being exploited. Thus the effect of weaknesses isn't as important for humans.




        And in terms of "interesting" I must say that I find the
        programs best play much more interesting than it's worst play.
        With best play I don't mean some book play ofcourse, but a
        fine solution to a tricky problem.


    "Tricky problems" is what a computer does best, a localized search
    for a solution, possibly even brute forced. This isn't very
    impressive to me.


You are clearly anti-computer and your comments seem to reflect a kind of emotional prejudice instead of logic - for instance thinking that it's fair to set up matches in such a way that the computers weaknesses are exaggerated even more. The computers inability to lean is already a handicap right from the first move.


If I am anti-anything, it would be against bias in program authors and testers. I am for intellectual honesty. If you think a program should be compensated in its ranking for the handicap that it can't learn, you should give the program a much higher ranking than it would get on a go server. After all, a 1dan amateur human player may one day qualify as a professional, or even win a professional title, thanks to his ability to learn.

For what it's worth, I would want my own program to be tested to these standards of mine as well, and I would play it many games trying to find and exploit flaws, and seek other players willing to test it in the same way, regardless what impact it would have on anyones thoughts on its rank. Not to put down the work I'd put into it, but so that I could improve it.



        Granted, truly awesome play is currently mostly to be seen on 9*9.
        But I've seen some great kills on the big board that any top
        amateur could be proud of.


    And how do you deal with confirmation bias? If you look for
    exceptional results, do you also look for spectacular failures?
    What about if a program gets an occasional brilliant win, but
    still loses most of the games?


There is a system that averages the two, it's called the ELO system, or if you prefer the Go ranking system. The spectacular failures will be reflected in the numbers.

You comment on this a bit out of context, let me try to get it back on track; does the ELO system, or any other ranking system, give any program a rank on the big board that any top amateur could be proud of? I was merely commenting on the apparent confirmation bias.


You remind me of the man who ties someones hands behind his back and then fights him. When the handicapped man bites you, you complain that he is not fighting fair.

If you set up the match right, you can give either player an advantage but I for one would seriously not trust the results of such manipulation. I think you need some kind of reasonable justification for setting up match conditions such that one player has every possible advantage.

So I have a suggestion. Let's play the match at the rate of game in 1 minute. I hold the human to higher standards and it doesn't seem fair to me to set time controls in such as way that the human has such an easy game of it. That's just not fair and it inflates the ELO of the human player. Your counter argument better not be that this is he accepted time control for human players - of course it is.

I am not suggesting these unfair tests because I want the human to win, at least not directly. I wish you would give me enough credit to look deeper than that.

What I want is to be convinced that there /isn't/ any bias in the program's favour. That playing a hundred games /will/ give the same result as playing five. I want a program to meet this standard of mine.

Either you think that some of todays programs can do this, in which case my suggestion isn't unfair at all! Or you think all programs do have flaws that can be learned and exploited by humans, and you merely have a different standard than me on what you would consider a good go player. This is perfectly OK. You don't have to be offended that we disagree on our standards. If you still are, you can write it off as me just suffering from perfectionism (which I admittedly do). Tradition has worked out many ranking systems that works just fine, are counted as very fair, for humans, and I do not think it unreasonable to think a computer program should only be measured under these systems. I simply have my own standards that takes into account at least one additional factor that may come into play with programs that don't learn the way humans do.

I want programs to improve under /my/ standard. I am kind of hoping other go coders feel the same. Isn't handling the hard problem of playing go what this is all about, and not just getting a high rating for kudos and commercial gain? Surely tackling what your program is worst at does this the best?
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to