When checking the NetworkTest results, at what point, at what avg. latency measure, do you think it's better to not using routing? 100µs, 50µs?
I'm just wondering if there is a rule of thumb. Thanks Louis On 13-03-25 05:44 PM, Sébastien Boisvert wrote: > On 25/03/13 12:46 PM, Ino de Bruijn wrote: >> Thanks a lot for the prompt reply! >> >> > What is your interconnect ? >> > >> > Do you have Infiniband ? >> >> It is a Cray XE6 system that uses the Cray Gemini interconnect technology. >> These are the full specs: >> http://www.pdc.kth.se/resources/computers/lindgren/hardware > > In that case, you don't need to route your messages ! > > The Cray XE6 has the best interconnect out there ! It's a 5D torus. > >> >> Is the polytope connection type good for this type of interconnect as well? >> > > You should remove the -route-messages option altogether. > > > -route-messages is useful when using buggy Infiniband fabrics or TCP > networks. If you use something like: > > * Cray XE6 > * IBM Blue Gene/Q > * Intel PSM (QLogic Infiniband) > * IBM iDataPlex > > > you don't need this option because these systems are really good and provide > low-latency any-to-any message passing. > > >> Best regards, >> Ino >> >> > Date: Mon, 25 Mar 2013 11:24:21 -0400 >> > From: sebastien.boisver...@ulaval.ca >> > To: denovoassembler-users@lists.sourceforge.net >> > Subject: Re: [Denovoassembler-users] Long execution time, seems to be >> stuck at Rank 0 >> > >> > On 25/03/13 05:32 AM, Ino de Bruijn wrote: >> > > Dear Sébastien Boisvert, >> > > >> > > I am trying to assemble a paired-end Illumina Hiseq library of about 1 >> billion reads. I ran Ray with: >> > > >> > > mpiexec -n 1024 Ray \ >> > > -k \ >> > > 31 \ >> > > -i \ >> > > metassemble/assemblies/ray/pair.fastq \ >> > > -o \ >> > > metassemble/assemblies/ray/out_31 \ >> > > -read-write-checkpoints \ >> > > metassemble/assemblies/ray/out_31.cp \ >> > > -route-messages >> > >> > Without other arguments, -route-messages will use a de Bruijn graph for >> routing, which is not really good. >> > >> > What is your interconnect ? >> > >> > Do you have Infiniband ? >> > >> > >> > Use this instead (the polytope is the best routing engine in RayPlatform): >> > >> > -route-messages -connection-type polytope -routing-graph-degree 62 >> > >> > (from >> https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt ) >> > >> > > >> > > For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try higher >> values of k (i.e. {41..81..10}), the run >> > > is exited by the scheduler after a day (max run time is one day). If I >> look at the log of the stdout it seems >> > > like only Rank 0 is doing something at the end. Here are are a couple >> of lines from the output: >> > > >> > >> > Well, you are using a de Bruijn graph for routing your messages. A de >> Bruijn graph is theoretically cool for routing messages, >> > but in practice it's very bad because it's not adaptative and it's just a >> pit containing so many choke points. >> > >> > From the manual >> https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt : >> > >> > -connection-type type >> > Sets the connection type for routes. >> > Accepted values are debruijn, hypercube, polytope, group, random, kautz >> and complete. Default is debruijn. >> > >> > >> > >> > > Rank 0 is counting k-mers in sequence reads [51200001/249559758] >> > > Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second >> > > Estimated remaining time for this step: 11 hours, 13 minutes, 27 seconds >> > > >> > > This keeps going while only Rank 0 is outputting. The final message >> says there are 30 minutes left for k=41. For 51 and 61 it is around 10-20h >> left and for k=71 and k=81 it is about an hour again. >> > >> > You should definitely use the polytope. It has no choke points, and >> routes are adaptative (i.e. messages between A and B will use several paths). >> > >> > > Does it only use Rank 0 at this step because this step can only be done >> by one core or is the graph that Rank 0 contains highly complex or something? >> > >> > At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph. >> > >> > > >> > > If I want to continue running Ray. Can I resume the process by running >> the same parameters, but using only one core (-n 1)? Or >> > > should I use more cores? >> > >> > You have to re-launch Ray with the same command except the -o parameter. >> > >> > Example: >> > >> > mpiexec -n 1024 Ray \ >> > -k \ >> > 31 \ >> > -i \ >> > metassemble/assemblies/ray/pair.fastq \ >> > -o \ >> > metassemble/assemblies/ray/out_31 \ >> > -read-write-checkpoints \ >> > metassemble/assemblies/ray/out_31.cp \ >> > -route-messages -connection-type polytope -routing-graph-degree 62 >> > >> > >> > > When is the checkpointing done? >> > >> > At each step. >> > >> > To see your checkpoint files: >> > >> > ls metassemble/assemblies/ray/out_31.cp | less >> > >> > >> > >> > > >> > > It seems like I have to remove the output dir to resume from a >> checkpoint. Is that correct? >> > >> > No. It's not necessary. You can instead provide a new output directory. >> > >> > > >> > > Best regards, >> > > Ino de Bruijn >> > >> > >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Everyone hates slow websites. So do we. >> > Make your web apps faster with AppDynamics >> > Download AppDynamics Lite for free today: >> > http://p.sf.net/sfu/appdyn_d2d_mar >> > _______________________________________________ >> > Denovoassembler-users mailing list >> > Denovoassembler-users@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users > > > ------------------------------------------------------------------------------ > Own the Future-Intel® Level Up Game Demo Contest 2013 > Rise to greatness in Intel's independent game demo contest. > Compete for recognition, cash, and the chance to get your game > on Steam. $5K grand prize plus 10 genre and skill prizes. > Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d > _______________________________________________ > Denovoassembler-users mailing list > Denovoassembler-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users > ------------------------------------------------------------------------------ Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users