From: Petri Pitkanen <[email protected]>
2011/6/17 Jean-loup Gailly <[email protected]> I have done precisely this. The reports of scalability death are greatly >exaggerated, as you can see from the attached graph. To avoid self play >benchmarks which are misleading, I tested Pachi against Fuego 1.1. > FuegoJean-loup > > > Well this gives a biased solution. Wrong sample so to speak. Fuego will not create complex semeais and har read ishi-no-shita nakade shapes i.e opponent that puts no pressure to known problems . So you prove that agains opponent who does not play like human you do scale. But you advance the ladder of human players these small issues tend pop-up more often. Scaling measurement against strong humans is obviously bit hard. Just about only thing is letting different CPU machines play in KGS. Yes I do believe that pachi/Fuego will play better given more time. But It would scale better if there were better algorithm in place and part of that extra CPU would be used there. Just that exactly what to for it is bit murky. So I don't think that we get to 6 Dan EGF (8-9 Dan KGS?) with current programs just adding memory and CPU. Petri I believe Petri is correct. An automatic tournament amongst a few similar MCTS programs, which tend to have similar weak points, is not as useful as playing against a strong, adaptive human community. The humans will discover and exploit entire categories of bugs - such as failure to understand nakade, insensitivity to capturing races, failure to understand the value of a big eye in a capturing race, weak borders of central moyos, poor yose skills - which may be shared by both programs in any given match. Even if the programs don't share the same weakness, it will be rare for a program to exploit such a weakness in other programs. Humans, on the other hand, tend to observe and adapt. When a weakness becomes known, it will be exploited. That said, I can donate four cores for a few weeks or months for a study, however it is organized. I'd like to suggest looking into the costs of setting up a farm on Rackspace or Amazon, and a "pay me" button so that people could toss virtual coins into the meter to keep the experiment running. Another possibility: by now there must be a large database of known situations where strong programs managed to snatch defeat from the jaws of certain victory. ( If not, there should be; examples certainly abound. ) How about a scalability study which asks "how many playouts must program X use to handle situation Yn correctly, for a large set of Y?" Ideally, such a test should follow through - if a bad move is made, it should be punished; if the first move is correct, it should respond with one of several replies, and determine if the program continues to play correctly.
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
