Here is some preliminary information on the distributed version of pachi. Petr (pasky) and I will publish all the details later, this is just to give you an idea of what we are doing. Pasky is the main author of pachi and wrote most of the single machine code. I wrote the distributed code and some other improvements.
All the code, including the distributed code, is GPL and available at http://repo.or.cz/w/pachi.git/ The distributed pachi uses simple tcp/ip sockets, not MPI. This makes it portable to many environments. A master process receives stats updates regularly from all the slaves and distributes the aggregated updates back to all slaves. The master-slave protocol is specific to pachi but it is rather simple. It is fault tolerant: if a slave dies, the master will send again the whole game to the new slave that will replace it. If the master dies, I ignore the current game and restart a new one when doing test runs. If the master dies when running for KGS, I kill the kgsGtp program and start a new one; KGS then sends again the partial game and we continue from there. I measured scalability both on a single machine and in distributed mode. All the details will be published, but here is a summary. In single machine mode, doubling the number of cores gains roughly 100 elo or one stone. (I measured one stone to be approximately 100 elo). This is true up to the number of cores I can test (20 per machine, other cores are reserved for the OS and other apps). In distributed mode doubling the number of machines initially gains approximately 50 elo (half a stone) up to 8 machines. Above this we quickly hit a scalability limit and the best result so far is with 64 machines; this is the configuration used for the KGS tournament (starting at round 4) and on KGS right now. 128 machines are currently much worse than 64. Preliminary analysis of the lost games shows that the current code has inherent scalability limits because the playouts are biased. When the playouts incorrectly judge the life status of a group, the results will be bad no matter how many cores and machines work on it. We are of course working on this to eliminate these scalability limits. Pachi has benefited enormously from ideas published on the computer-go mailing list and in many papers. By making its source completely open we hope to encourage further progress in this area. Petr and Jean-loup
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
