I don't want this to get into a big debate,  but here is what it comes down
to in my opinion.      You are trying to isolate the learning process from
some hypothetical state where "learning is complete."     This is impossible
to do and it's not what the human state is all about.   We constantly learn
and you cannot isolate the two things.

There are players who start young and improve during their entire lives,
 and when they lose we would consider it pretty strange if they said this
game shouldn't count because they are still "learning."     Wouldn't you
consider that intellectually dishonest on their part?

That's why I consider this idea ill-conceived.   It's rather like a foot
race where you let the slow starter have a running start because you
consider it unfair that he get penalized for being a slow starter.
Whatever you think you are measuring,  it's not a fair race.

Tournament play is what it is and if the conditions are not the same for
every player  then it's inherently unfair.  I think it's pretty silly to
consider games invalid because you believe you have not yet learned enough
about the opponent to take full advantage of his ignorance.   That is what I
consider intellectually dishonest (to borrow your own phrase.)

For some reason I think of Las Vegas,  another rigged game.   I do not
gamble as I consider a form of ugly greed but I know that the games are
rigged so that you cannot win.    And if someone actually develops the skill
to win at something,  such as by counting cards - they will kick you out.
  It's a bizarre situation where you are allowed to play as long as you are
not very good at it.       This is rather like that,   let the human keep
playing the computer until he figures out how to beat it,  then start rating
the games.    And of course if the computer wins anyway,  the human just
stops playing it.     It sounds like an honest rating system to me.


Don






On Sat, Jul 24, 2010 at 5:21 AM, Raymond Wold <[email protected]>wrote:

>  On 24.07.2010 00:24, Don Dailey wrote:
>
>
>
> On Fri, Jul 23, 2010 at 3:54 PM, Raymond Wold 
> <[email protected]>wrote:
>
>> On 23.07.2010 02:12, Stefan Kaitschick wrote:
>>
>>> Why should the worst case be the most interesting?
>>> In a program of this complexity worst case isn't the "true" strength of
>>> the program.
>>> Worst case is basically a bug.
>>>
>>
>>  Given enough time, even an "AI" that chooses entirely randomly from the
>> legal moves available will get a win against the best human.
>
>
>  I'm trying to follow this conversation, but I don't get your point on
> this.
>
>
> It was a point about ignoring poor performance and only looking at the best
> results, which can justify pretty much anything as long as there is some
> variation in the performance.
>
>
>
>
>>
>>  What's wrong with looking average play to judge the program?
>>>
>>
>>  Well, it depends on how you measure the average. The typical, putting
>> your bot on a go server and letting it play self-selected humans, is not
>> very good, as surprise at playing style, no knowledge of the program's
>> fundamental flaws, and so on, (not to mention humans not necessarily taking
>> a game against a bot as serious) will bias it towards the program.
>>
>
>  Playing strength is a function of everything you know and don't know and
> all your strengths and weaknesses and how you put them together.     Every
> player, even the very best has weaknesses but those do not determine how
> strong he is.
>
>
> These two sentences contradict eachother to me. If a program plays as a
> 9dan professional as long as there are never any ladders, but has a bug that
> flips the status of every ladder, then that weakness /does/ determine how
> strong he is. As incredible as it would be to make such a program, it would
> not deserve to be classed along with the human 9dan professionals, it would
> probably be more correct to consider it double digit kyu, as anyone who
> knows about ladders could make one, or several, and practically seal any
> game once they know of the flaw. It 's entirely conceivable that such a
> fault could go undetected at least for a few games, or that the program
> could get wins against professionals that didn't know about it.
>
>
>
>  I think your cognitive mental model of how this works is broken.
>
>
> I didn't mention any cognitive mental model. I am speaking of what I
> consider fair rank for computer programs.
>
>
>  I used to believe that a chain is as strong as it's weakest link - using
> this analogy to improve my tennis game.   But a pro I was friends with
> taught me a pretty valuable lesson.    He told me that my weaknesses do not
> have to define my game and he taught me how to be aggressive with my
> strengths and thus minimize the weaknesses.    He gave me several examples
> of top pro's that were far from the best at certain things,  but played in
> such a way that is was only a minor issue and they reached the very top
> ranks.     Then I remembered that a chess master basically told me the same
> thing.
>
>
> Well, it is fairly hard to make a computer program recognize and try to
> minimize its weaknesses. You, as the programmer, can recognize it and try to
> mitigate it, and that will probably increase its rank even under my terms
> (and would please me no end). As long as you truly minimize the effect of
> the weaknesses, so it's not a matter of just delaying the time it takes for
> a human playing a long series of games to discover the weaknesses and
> exploit them.
>
>
>
>
>> I would not object to an average of, say, 100 games against one human
>> opponent trying his best to win. With an even result under such a series I
>> would certainly consider the program as strong as the human.
>
>
>  Within 2 or 3 games the human as learned 80% of what he needs to know.
>  Since most players who will play your program have already played many
> others that have the same basic characteristics,   my conclusion is that
> computer strength as measured by ELO in KGS games is accurate because they
> have already taken the hit for this particular weakness.
>
>
> With "simple" bugs like not reading ladders, or flaws in life-and-death
> analysis or patterns, I would agree. I am not however convinced that todays
> go programs do not have more subtle weaknesses when it comes to playing
> style, that it will take more than a few games to determine.
>
>
>  Your idea that there is some kind of break even point around game 100 is
> completely ludicrous.
>
>
> I am not sure what you are referring to here. My point about even result
> after a hundred games was  in the sense of two players playing a hundred
> games (or a thousand, the number isn't the point,) against eachother, and
> each player winning around half, counting as a more trustworthy measure of
> them being equal in strength when one of the players is a program.
>
>
>  Basically your idea of fair is that the first few games shouldn't count -
> you just said it differently and it's a ridiculous idea.
>
>
> That is indeed what I am saying, and I don't think it is so ridiculous.
>
>
>  I cannot tell you how often I have played some opponent that I could not
> beat the first few times until I leaned how to play him (in chess) and it
> has also happened just the opposite for me where I seemed to win easily but
> it was clear that my opponent was studying and learning.    Your comments
> suggest the first few games are invalid in ANY encounter.
>
>
> A human can much easier detect and correct his flaws, especially when they
> are being exploited. Thus the effect of weaknesses isn't as important for
> humans.
>
>
>
>
>>
>>  And in terms of "interesting" I must say that I find the programs best
>>> play much more interesting than it's worst play.
>>> With best play I don't mean some book play ofcourse, but a fine solution
>>> to a tricky problem.
>>>
>>
>>  "Tricky problems" is what a computer does best, a localized search for a
>> solution, possibly even brute forced. This isn't very impressive to me.
>
>
>  You are clearly anti-computer and your comments seem to reflect a kind of
> emotional prejudice instead of logic -  for instance thinking that it's fair
> to set up matches in such a way that the computers weaknesses are
> exaggerated even more.      The computers inability to lean is already a
> handicap right from the first move.
>
>
> If I am anti-anything, it would be against bias in program authors and
> testers. I am for intellectual honesty. If you think a program should be
> compensated in its ranking for the handicap that it can't learn, you should
> give the program a much higher ranking than it would get on a go server.
> After all, a 1dan amateur human player may one day qualify as a
> professional, or even win a professional title, thanks to his ability to
> learn.
>
> For what it's worth, I would want my own program to be tested to these
> standards of mine as well, and I would play it many games trying to find and
> exploit flaws, and seek other players willing to test it in the same way,
> regardless what impact it would have on anyones thoughts on its rank. Not to
> put down the work I'd put into it, but so that I could improve it.
>
>
>
>>
>>  Granted, truly awesome play is currently mostly to be seen on 9*9.
>>> But I've seen some great kills on the big board that any top amateur
>>> could be proud of.
>>>
>>
>>  And how do you deal with confirmation bias? If you look for exceptional
>> results, do you also look for spectacular failures? What about if a program
>> gets an occasional brilliant win, but still loses most of the games?
>
>
>  There is a system that averages the two, it's called the ELO system,  or
> if you prefer the Go ranking system.    The spectacular failures will be
> reflected in the numbers.
>
>
> You comment on this a bit out of context, let me try to get it back on
> track; does the ELO system, or any other ranking system, give any program a
> rank on the big board that any top amateur could be proud of? I was merely
> commenting on the apparent confirmation bias.
>
>
>
>  You remind me of the man who ties someones hands behind his back and then
> fights him.   When the handicapped man bites you,  you complain that he is
> not fighting fair.
>
>  If you set up the match right, you can give either player an advantage
> but I for one would seriously not trust the results of such manipulation.
>    I think you need some kind of reasonable justification for setting up
> match conditions such that one player has every possible advantage.
>
>  So I have a suggestion.    Let's play the match at the rate of game in 1
> minute.    I hold the human to higher standards and it doesn't seem fair to
> me to set time controls in such as way that the human has such an easy game
> of it.    That's just not fair and it inflates the ELO of the human player.
>     Your counter argument better not be that this is he accepted time
> control for human players - of course it is.
>
>
> I am not suggesting these unfair tests because I want the human to win, at
> least not directly. I wish you would give me enough credit to look deeper
> than that.
>
> What I want is to be convinced that there /isn't/ any bias in the program's
> favour. That playing a hundred games /will/ give the same result as playing
> five. I want a program to meet this standard of mine.
>
> Either you think that some of todays programs can do this, in which case my
> suggestion isn't unfair at all! Or you think all programs do have flaws that
> can be learned and exploited by humans, and you merely have a different
> standard than me on what you would consider a good go player. This is
> perfectly OK. You don't have to be offended that we disagree on our
> standards. If you still are, you can write it off as me just suffering from
> perfectionism (which I admittedly do). Tradition has worked out many ranking
> systems that works just fine, are counted as very fair, for humans, and I do
> not think it unreasonable to think a computer program should only be
> measured under these systems. I simply have my own standards that takes into
> account at least one additional factor that may come into play with programs
> that don't learn the way humans do.
>
> I want programs to improve under /my/ standard. I am kind of hoping other
> go coders feel the same. Isn't handling the hard problem of playing go what
> this is all about, and not just getting a high rating for kudos and
> commercial gain? Surely tackling what your program is worst at does this the
> best?
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to