Hi Sébastien, I've run two network-only tests on 1024 cores (64 nodes, 16 cores ea., QDR IB) using rc-4. One test is with "complete" routing (i.e. routing turned off), and the second using the default debruijn. The speed up in latency is significant:
# AverageForAllRanks: 1081.61 # StandardDeviation: 35.2018 versus with debruijn -route-messages # AverageForAllRanks: 280.065 # StandardDeviation: 47.1641 I have the same two tests but with 2048 cores (128 nodes) queued up, and will report back once they've finished. Note that this was running on a shared cluster that had other jobs, but these two tests were run in quick succession. Thanks, Ben > Hello, > > If you are running Ray (or plan to) on a large number of cores, you > might be interested > in a new feature available in the development tree of Ray. > > This feature is a new option called -route-messages. > > In Ray, any core can sends a message directly to any other core > including itself. > > For example, if you run Ray on 512 cores (let's say 64 computers with > 8 cores each), then each core has 511 connections -- one with each other > core. > > This means that each core has to check for incoming messages in a > round-robin > fashion for all the 512 cores (this includes itself). > > In this setting, the communication network is complete with > 512 cores and 130816 connections (512 * 511 / 2). > > One way to avoid such a huge number of connections is to allow each core to > communicate directly with only a few others. To do so, we can take the > logarithm > in base 2 of the number of cores to get the average number of > connections of a core > > log2(512)=9. > > Considering that we want any core to have 9 connections on average, we > need to > select randomly 512*9 / 2 connections from the 130816 connections in order > to build the random graph. > > Such a random graph has 512 cores and an average number of connections of > 9 and has exactly 2304 edges (512*9/2). > There are many such graphs but it is easy to pick up one. > > In this case, each core has to check for incoming messages in a round-robin > fashion for all the ~9+1 connections (+1 to include itself). > > There is also less memory utilised for incoming buffers. > > And the length of the shortest route between any pair of cores in this > random graph is, > on average, 3 connections. > > This is because there are 9 first neighbors, 81 second neighbors and > 729 third neighbors (which are redundant). > > But the main motivation is that the latency is reduced by 60 %. > > > The latency without this routing with random graphs: > > 386 microseconds (standard deviation: 9) > > > The latency with this routing with random graphs: > > 158 microseconds (standard deviation: 15) > > > > If anyone would like to share its experience with Ray on a large number > of cores, go ahead. > > > More detailed post on the Open-MPI list (more technical): > > http://www.open-mpi.org/community/lists/users/2011/11/17737.php > > > Happy assembly. > > > Sébastien Boisvert > http://boisvert.info ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users