On 08/04/13 09:58 AM, Louis Letourneau wrote: > But there is no rule of thumb? If the adapter can't service that many > connections, would latency show this? So would using only latency to > guess be ok?
Yes, that's a good indicator. On some machines, you may get MPI_ERR_OTHER. > > I'm just wondering if a new tech comes out or if someone tries a special > inter node communication, how can you "guess" if you need routing or > not, beside doing 2 full assemblies with and without. > There is an option for that: mpiexec -n 16384 -test-network-only -o Without-Routing mpiexec -n 16384 -test-network-only -o Without-Routing -route-messages Ideally, there would be an option called -route-messages-if-it-is-better ;-) > Thanks > Louis > > On 13-04-08 09:33 AM, Sébastien Boisvert wrote: >> On 08/04/13 09:11 AM, Louis Letourneau wrote: >>> When checking the NetworkTest results, at what point, at what avg. >>> latency measure, do you think it's better to not using routing? >>> 100µs, 50µs? >>> >> >> On a Cray XE6 or on a IBM Blue Gene/Q, you never need the software message >> routing. >> >> From experience, you don't need software message routing on a IBM >> iDataPlex neither. >> >> For everything else, it really depends on the setup. Sometimes, it is needed >> because the >> host communication adapter can not service that much communication links per >> node. >> >>> I'm just wondering if there is a rule of thumb. >>> >>> Thanks >>> Louis >>> >>> On 13-03-25 05:44 PM, Sébastien Boisvert wrote: >>>> On 25/03/13 12:46 PM, Ino de Bruijn wrote: >>>>> Thanks a lot for the prompt reply! >>>>> >>>>> > What is your interconnect ? >>>>> > >>>>> > Do you have Infiniband ? >>>>> >>>>> It is a Cray XE6 system that uses the Cray Gemini interconnect >>>>> technology. These are the full specs: >>>>> http://www.pdc.kth.se/resources/computers/lindgren/hardware >>>> >>>> In that case, you don't need to route your messages ! >>>> >>>> The Cray XE6 has the best interconnect out there ! It's a 5D torus. >>>> >>>>> >>>>> Is the polytope connection type good for this type of interconnect as >>>>> well? >>>>> >>>> >>>> You should remove the -route-messages option altogether. >>>> >>>> >>>> -route-messages is useful when using buggy Infiniband fabrics or TCP >>>> networks. If you use something like: >>>> >>>> * Cray XE6 >>>> * IBM Blue Gene/Q >>>> * Intel PSM (QLogic Infiniband) >>>> * IBM iDataPlex >>>> >>>> >>>> you don't need this option because these systems are really good and >>>> provide low-latency any-to-any message passing. >>>> >>>> >>>>> Best regards, >>>>> Ino >>>>> >>>>> > Date: Mon, 25 Mar 2013 11:24:21 -0400 >>>>> > From: sebastien.boisver...@ulaval.ca >>>>> > To: denovoassembler-users@lists.sourceforge.net >>>>> > Subject: Re: [Denovoassembler-users] Long execution time, seems to >>>>> be stuck at Rank 0 >>>>> > >>>>> > On 25/03/13 05:32 AM, Ino de Bruijn wrote: >>>>> > > Dear Sébastien Boisvert, >>>>> > > >>>>> > > I am trying to assemble a paired-end Illumina Hiseq library of >>>>> about 1 billion reads. I ran Ray with: >>>>> > > >>>>> > > mpiexec -n 1024 Ray \ >>>>> > > -k \ >>>>> > > 31 \ >>>>> > > -i \ >>>>> > > metassemble/assemblies/ray/pair.fastq \ >>>>> > > -o \ >>>>> > > metassemble/assemblies/ray/out_31 \ >>>>> > > -read-write-checkpoints \ >>>>> > > metassemble/assemblies/ray/out_31.cp \ >>>>> > > -route-messages >>>>> > >>>>> > Without other arguments, -route-messages will use a de Bruijn graph >>>>> for routing, which is not really good. >>>>> > >>>>> > What is your interconnect ? >>>>> > >>>>> > Do you have Infiniband ? >>>>> > >>>>> > >>>>> > Use this instead (the polytope is the best routing engine in >>>>> RayPlatform): >>>>> > >>>>> > -route-messages -connection-type polytope -routing-graph-degree 62 >>>>> > >>>>> > (from >>>>> https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt ) >>>>> > >>>>> > > >>>>> > > For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try >>>>> higher values of k (i.e. {41..81..10}), the run >>>>> > > is exited by the scheduler after a day (max run time is one day). >>>>> If I look at the log of the stdout it seems >>>>> > > like only Rank 0 is doing something at the end. Here are are a >>>>> couple of lines from the output: >>>>> > > >>>>> > >>>>> > Well, you are using a de Bruijn graph for routing your messages. A >>>>> de Bruijn graph is theoretically cool for routing messages, >>>>> > but in practice it's very bad because it's not adaptative and it's >>>>> just a pit containing so many choke points. >>>>> > >>>>> > From the manual >>>>> https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt : >>>>> > >>>>> > -connection-type type >>>>> > Sets the connection type for routes. >>>>> > Accepted values are debruijn, hypercube, polytope, group, random, >>>>> kautz and complete. Default is debruijn. >>>>> > >>>>> > >>>>> > >>>>> > > Rank 0 is counting k-mers in sequence reads [51200001/249559758] >>>>> > > Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second >>>>> > > Estimated remaining time for this step: 11 hours, 13 minutes, 27 >>>>> seconds >>>>> > > >>>>> > > This keeps going while only Rank 0 is outputting. The final >>>>> message says there are 30 minutes left for k=41. For 51 and 61 it is >>>>> around 10-20h left and for k=71 and k=81 it is about an hour again. >>>>> > >>>>> > You should definitely use the polytope. It has no choke points, and >>>>> routes are adaptative (i.e. messages between A and B will use several >>>>> paths). >>>>> > >>>>> > > Does it only use Rank 0 at this step because this step can only be >>>>> done by one core or is the graph that Rank 0 contains highly complex or >>>>> something? >>>>> > >>>>> > At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph. >>>>> > >>>>> > > >>>>> > > If I want to continue running Ray. Can I resume the process by >>>>> running the same parameters, but using only one core (-n 1)? Or >>>>> > > should I use more cores? >>>>> > >>>>> > You have to re-launch Ray with the same command except the -o >>>>> parameter. >>>>> > >>>>> > Example: >>>>> > >>>>> > mpiexec -n 1024 Ray \ >>>>> > -k \ >>>>> > 31 \ >>>>> > -i \ >>>>> > metassemble/assemblies/ray/pair.fastq \ >>>>> > -o \ >>>>> > metassemble/assemblies/ray/out_31 \ >>>>> > -read-write-checkpoints \ >>>>> > metassemble/assemblies/ray/out_31.cp \ >>>>> > -route-messages -connection-type polytope -routing-graph-degree 62 >>>>> > >>>>> > >>>>> > > When is the checkpointing done? >>>>> > >>>>> > At each step. >>>>> > >>>>> > To see your checkpoint files: >>>>> > >>>>> > ls metassemble/assemblies/ray/out_31.cp | less >>>>> > >>>>> > >>>>> > >>>>> > > >>>>> > > It seems like I have to remove the output dir to resume from a >>>>> checkpoint. Is that correct? >>>>> > >>>>> > No. It's not necessary. You can instead provide a new output >>>>> directory. >>>>> > >>>>> > > >>>>> > > Best regards, >>>>> > > Ino de Bruijn >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------------------ >>>>> > Everyone hates slow websites. So do we. >>>>> > Make your web apps faster with AppDynamics >>>>> > Download AppDynamics Lite for free today: >>>>> > http://p.sf.net/sfu/appdyn_d2d_mar >>>>> > _______________________________________________ >>>>> > Denovoassembler-users mailing list >>>>> > Denovoassembler-users@lists.sourceforge.net >>>>> > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Own the Future-Intel® Level Up Game Demo Contest 2013 >>>> Rise to greatness in Intel's independent game demo contest. >>>> Compete for recognition, cash, and the chance to get your game >>>> on Steam. $5K grand prize plus 10 genre and skill prizes. >>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d >>>> _______________________________________________ >>>> Denovoassembler-users mailing list >>>> Denovoassembler-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>>> >>> >>> ------------------------------------------------------------------------------ >>> Minimize network downtime and maximize team effectiveness. >>> Reduce network management and security costs.Learn how to hire >>> the most talented Cisco Certified professionals. Visit the >>> Employer Resources Portal >>> http://www.cisco.com/web/learning/employer_resources/index.html >>> _______________________________________________ >>> Denovoassembler-users mailing list >>> Denovoassembler-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>> >> >> >> ------------------------------------------------------------------------------ >> Minimize network downtime and maximize team effectiveness. >> Reduce network management and security costs.Learn how to hire >> the most talented Cisco Certified professionals. Visit the >> Employer Resources Portal >> http://www.cisco.com/web/learning/employer_resources/index.html >> _______________________________________________ >> Denovoassembler-users mailing list >> Denovoassembler-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >> > > ------------------------------------------------------------------------------ > Minimize network downtime and maximize team effectiveness. > Reduce network management and security costs.Learn how to hire > the most talented Cisco Certified professionals. Visit the > Employer Resources Portal > http://www.cisco.com/web/learning/employer_resources/index.html > _______________________________________________ > Denovoassembler-users mailing list > Denovoassembler-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users > ------------------------------------------------------------------------------ Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users