On 08/04/13 09:11 AM, Louis Letourneau wrote: > When checking the NetworkTest results, at what point, at what avg. > latency measure, do you think it's better to not using routing? > 100µs, 50µs? >
On a Cray XE6 or on a IBM Blue Gene/Q, you never need the software message routing. From experience, you don't need software message routing on a IBM iDataPlex neither. For everything else, it really depends on the setup. Sometimes, it is needed because the host communication adapter can not service that much communication links per node. > I'm just wondering if there is a rule of thumb. > > Thanks > Louis > > On 13-03-25 05:44 PM, Sébastien Boisvert wrote: >> On 25/03/13 12:46 PM, Ino de Bruijn wrote: >>> Thanks a lot for the prompt reply! >>> >>> > What is your interconnect ? >>> > >>> > Do you have Infiniband ? >>> >>> It is a Cray XE6 system that uses the Cray Gemini interconnect technology. >>> These are the full specs: >>> http://www.pdc.kth.se/resources/computers/lindgren/hardware >> >> In that case, you don't need to route your messages ! >> >> The Cray XE6 has the best interconnect out there ! It's a 5D torus. >> >>> >>> Is the polytope connection type good for this type of interconnect as well? >>> >> >> You should remove the -route-messages option altogether. >> >> >> -route-messages is useful when using buggy Infiniband fabrics or TCP >> networks. If you use something like: >> >> * Cray XE6 >> * IBM Blue Gene/Q >> * Intel PSM (QLogic Infiniband) >> * IBM iDataPlex >> >> >> you don't need this option because these systems are really good and provide >> low-latency any-to-any message passing. >> >> >>> Best regards, >>> Ino >>> >>> > Date: Mon, 25 Mar 2013 11:24:21 -0400 >>> > From: sebastien.boisver...@ulaval.ca >>> > To: denovoassembler-users@lists.sourceforge.net >>> > Subject: Re: [Denovoassembler-users] Long execution time, seems to be >>> stuck at Rank 0 >>> > >>> > On 25/03/13 05:32 AM, Ino de Bruijn wrote: >>> > > Dear Sébastien Boisvert, >>> > > >>> > > I am trying to assemble a paired-end Illumina Hiseq library of about >>> 1 billion reads. I ran Ray with: >>> > > >>> > > mpiexec -n 1024 Ray \ >>> > > -k \ >>> > > 31 \ >>> > > -i \ >>> > > metassemble/assemblies/ray/pair.fastq \ >>> > > -o \ >>> > > metassemble/assemblies/ray/out_31 \ >>> > > -read-write-checkpoints \ >>> > > metassemble/assemblies/ray/out_31.cp \ >>> > > -route-messages >>> > >>> > Without other arguments, -route-messages will use a de Bruijn graph for >>> routing, which is not really good. >>> > >>> > What is your interconnect ? >>> > >>> > Do you have Infiniband ? >>> > >>> > >>> > Use this instead (the polytope is the best routing engine in >>> RayPlatform): >>> > >>> > -route-messages -connection-type polytope -routing-graph-degree 62 >>> > >>> > (from >>> https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt ) >>> > >>> > > >>> > > For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try higher >>> values of k (i.e. {41..81..10}), the run >>> > > is exited by the scheduler after a day (max run time is one day). If >>> I look at the log of the stdout it seems >>> > > like only Rank 0 is doing something at the end. Here are are a couple >>> of lines from the output: >>> > > >>> > >>> > Well, you are using a de Bruijn graph for routing your messages. A de >>> Bruijn graph is theoretically cool for routing messages, >>> > but in practice it's very bad because it's not adaptative and it's just >>> a pit containing so many choke points. >>> > >>> > From the manual >>> https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt : >>> > >>> > -connection-type type >>> > Sets the connection type for routes. >>> > Accepted values are debruijn, hypercube, polytope, group, random, kautz >>> and complete. Default is debruijn. >>> > >>> > >>> > >>> > > Rank 0 is counting k-mers in sequence reads [51200001/249559758] >>> > > Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second >>> > > Estimated remaining time for this step: 11 hours, 13 minutes, 27 >>> seconds >>> > > >>> > > This keeps going while only Rank 0 is outputting. The final message >>> says there are 30 minutes left for k=41. For 51 and 61 it is around 10-20h >>> left and for k=71 and k=81 it is about an hour again. >>> > >>> > You should definitely use the polytope. It has no choke points, and >>> routes are adaptative (i.e. messages between A and B will use several >>> paths). >>> > >>> > > Does it only use Rank 0 at this step because this step can only be >>> done by one core or is the graph that Rank 0 contains highly complex or >>> something? >>> > >>> > At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph. >>> > >>> > > >>> > > If I want to continue running Ray. Can I resume the process by >>> running the same parameters, but using only one core (-n 1)? Or >>> > > should I use more cores? >>> > >>> > You have to re-launch Ray with the same command except the -o parameter. >>> > >>> > Example: >>> > >>> > mpiexec -n 1024 Ray \ >>> > -k \ >>> > 31 \ >>> > -i \ >>> > metassemble/assemblies/ray/pair.fastq \ >>> > -o \ >>> > metassemble/assemblies/ray/out_31 \ >>> > -read-write-checkpoints \ >>> > metassemble/assemblies/ray/out_31.cp \ >>> > -route-messages -connection-type polytope -routing-graph-degree 62 >>> > >>> > >>> > > When is the checkpointing done? >>> > >>> > At each step. >>> > >>> > To see your checkpoint files: >>> > >>> > ls metassemble/assemblies/ray/out_31.cp | less >>> > >>> > >>> > >>> > > >>> > > It seems like I have to remove the output dir to resume from a >>> checkpoint. Is that correct? >>> > >>> > No. It's not necessary. You can instead provide a new output directory. >>> > >>> > > >>> > > Best regards, >>> > > Ino de Bruijn >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Everyone hates slow websites. So do we. >>> > Make your web apps faster with AppDynamics >>> > Download AppDynamics Lite for free today: >>> > http://p.sf.net/sfu/appdyn_d2d_mar >>> > _______________________________________________ >>> > Denovoassembler-users mailing list >>> > Denovoassembler-users@lists.sourceforge.net >>> > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >> >> >> ------------------------------------------------------------------------------ >> Own the Future-Intel® Level Up Game Demo Contest 2013 >> Rise to greatness in Intel's independent game demo contest. >> Compete for recognition, cash, and the chance to get your game >> on Steam. $5K grand prize plus 10 genre and skill prizes. >> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d >> _______________________________________________ >> Denovoassembler-users mailing list >> Denovoassembler-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >> > > ------------------------------------------------------------------------------ > Minimize network downtime and maximize team effectiveness. > Reduce network management and security costs.Learn how to hire > the most talented Cisco Certified professionals. Visit the > Employer Resources Portal > http://www.cisco.com/web/learning/employer_resources/index.html > _______________________________________________ > Denovoassembler-users mailing list > Denovoassembler-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users > ------------------------------------------------------------------------------ Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users