---------------------------------------------- Don Said ---------------------------------------------- But first of all, thank you for you generous praise, I probably don't deserve it (but I'll take what I can get :-)
+++++++++++++++++++++ Response : +++++++++++++++++++++ It's hard to argument about that :) Still you have been a constant presence on this list. As i recall it, you may well be the person that have the top post per day score :) It doesn't automatically mean that all is good, but still that's something that gives a feeling of security. If you are involved, we know that (the project) can easily sustend the test of time :) (if it's worthy enough that is). That may be why i so much want you to feel good about it :) Still i probably think all that i said. ---------------------------------------------- Don Said ---------------------------------------------- An external tester will test for conformance and it will compare 2 bots, one of which we "trust" as being conforming. But the tester will not be deterministic, it will throw random positions at the bots so that a black box author cannot present it with something that has hard coded answers. Also, I don't want it to be tuned for any given position. +++++++++++++++++++++ Response : +++++++++++++++++++++ i definitely agree with all that. I guess rather than just yes or no, the test will output a probability of conformance :) ---------------------------------------------- Don Said ---------------------------------------------- I envision that you might be able to seed the tester with a random number in order to duplicate the testing conditions and if this becomes structured enough the "official" test would use a hidden standard seed. This would be required so that different programs are not presented with a different series of tests. +++++++++++++++++++++ Response : +++++++++++++++++++++ I'm not sure i really get the point. Or are you thinking past the "light playout" contest ? What's the problem with programs being presented different series of tests ? ---------------------------------------------- Don Said ---------------------------------------------- GTP is pretty much a necessity and is also very much a standard. It's necessary for external game playing and testing and for having an external tester. It doesn't make sense to produce a different system. We could make it spit out numbers that you key in to a spreadsheet of some kind or you could do the math manually, but this becomes unwieldy if the test is to be very sophisticated. +++++++++++++++++++++ Response : +++++++++++++++++++++ I agree that we need the program to communicate with the outside world. I agree that GTP is standard. I don't really think we should avoid it. I disagree if this means we shouldn't try to think out a system that would help people to take advantage of this standard in a faster and easier way. That do not mean we should indeed implement anything. Except if it's truly worth it. Still my opinion is that there is room for discussions there. ---------------------------------------------- Don Said ---------------------------------------------- In fact, this is the whole "raison d'être" (reason for being) of GTP, for communicating with programs. +++++++++++++++++++++ Response : +++++++++++++++++++++ I'm french by the way :p Well, i disagree. That's why GTP is standard : it allows for communicating with programs. It is a protocol of communication with programs .. Still it was engineered for the needs of gnugo (in particular regression testing i think). When i look at my code, I use a sort of own made communication standard. I use it as a set of call to function. And it's engineered to be a whole more easier than to output directly in the GTP format. For example, when i output a move, i just output a number. Not a vertex per-se. I have the feeling that a lot of people would use a representation like this one i use. Where you give a number to each intersection for example from 1 to 81. I use 0 as the pass message. And -1 as the resign message. It allows me to concentrate more on what is meaningful .. then i have a layer that translate all that to effective GTP-commands. Now it may be (or not) that this 1..81 representation do indeed reduce the thinking and testing time for interacting with GTP. It does for me. So i'd be happy to know if it would do for others too. I know that it was not very enjoyable for me to spend so much time on the GTP part. What i would like is to come up with a "better" way of representing things. That would be easier and more natural to implement. I have made that very informally on my own systems. Still it needs a lot more tunning and criticizing. So the plan was to do exactly what i do : propose for the contenders a way of handling messages and responses easier to implement. Then add a little external module to translate those into well-formed GTP commands. Suppose you have a GTP server, let's say GOGUI. Then you would pass it, the translater as the GO-PROGRAM. The translater would take as an argument the effective program. And execute it as is. Then there is the speed problem. We have great tools for testing programs agains each others. In particular with the gogui test-suit. (i don't know what CGOS uses). Still even for instantly generated moves (10 000 per seconds ..) it takes a few dozen seconds to get a game done, with the results. I still use it because it's so handy. But for fast generated move tests, it's really too slow. (now i wonder if my GTP tunnel wouldn't be the why it takes so much time, or of this is due to the server). Still there clearly is room for something faster there. But i do not say that it's really worthy re-engineering all the tools. Only that it may be worth discussing if there is an easy way to make this faster :) === So to resume, i have two issues : - it takes to much time to get a GTP engine right. (The point) - it's slow to use two_gtp with two fast move generator for statistical regression testing. (Alternate discussion effort ) === Therefore i wonder what solution we could come up to, even if it's clear that it's not worthy that someone implements it :) === ---------------------------------------------- Don Said ---------------------------------------------- We can always publish numbers that people can use for informal checking. But I definitely want some kind of conformance metric that is not just ad-hoc. +++++++++++++++++++++ Response : +++++++++++++++++++++ I agree with your statistical metric of random generated positions and comparison of the scoring done there. In fact it would be probably sufficient as a black box. The program-vs-program still adding more confidence :) ---------------------------------------------- Don Said ---------------------------------------------- I really like your idea of massive automated testing to test conformance, but you know this is extremely CPU intensive. +++++++++++++++++++++ Response : +++++++++++++++++++++ In fact .. no i don't know :) I have made up some numbers and it seems that indeed it would take a few hours at best. (for 2000 games) ---------------------------------------------- Don Said ---------------------------------------------- It would take tens or hundreds of thousands of games to be able to say with high confidence that 2 programs are functionally identical in strength. So I envision a primary test that runs relatively quickly and a more comprehensive test based on game play for the most interesting programs or for anyone will to take it that far. +++++++++++++++++++++ Response : +++++++++++++++++++++ I think 2000 games would be more than enough to prove a near 50% win ratio. The position scoring test can be fast. I suspect that it is also nearly enough to do only that. We can design it so it takes about 10 minutes per test suite. I think it is enough to generate a few positions (how much exactly) then asking both the reference bot, and the to-be-tested to score each legal moves of it. - with a number of simulation high enough to get some reproductibility - Then we can get a confidence bound on how much they are alike. As an added bonus, if the number of simulation is high enough, the server can also time the speed, without the network and communication latency impacting too much. (which may be of limited value, as the hardware wouldn't be tractable but well) _________________________________________________________________ Téléphonez gratuitement à tous vos proches avec Windows Live Messenger ! Téléchargez-le maintenant ! http://www.windowslive.fr/messenger/1.asp_______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
