On 25/03/13 12:46 PM, Ino de Bruijn wrote: > Thanks a lot for the prompt reply! > > > What is your interconnect ? > > > > Do you have Infiniband ? > > It is a Cray XE6 system that uses the Cray Gemini interconnect technology. > These are the full specs: > http://www.pdc.kth.se/resources/computers/lindgren/hardware
In that case, you don't need to route your messages ! The Cray XE6 has the best interconnect out there ! It's a 5D torus. > > Is the polytope connection type good for this type of interconnect as well? > You should remove the -route-messages option altogether. -route-messages is useful when using buggy Infiniband fabrics or TCP networks. If you use something like: * Cray XE6 * IBM Blue Gene/Q * Intel PSM (QLogic Infiniband) * IBM iDataPlex you don't need this option because these systems are really good and provide low-latency any-to-any message passing. > Best regards, > Ino > > > Date: Mon, 25 Mar 2013 11:24:21 -0400 > > From: sebastien.boisver...@ulaval.ca > > To: denovoassembler-users@lists.sourceforge.net > > Subject: Re: [Denovoassembler-users] Long execution time, seems to be > stuck at Rank 0 > > > > On 25/03/13 05:32 AM, Ino de Bruijn wrote: > > > Dear Sébastien Boisvert, > > > > > > I am trying to assemble a paired-end Illumina Hiseq library of about 1 > billion reads. I ran Ray with: > > > > > > mpiexec -n 1024 Ray \ > > > -k \ > > > 31 \ > > > -i \ > > > metassemble/assemblies/ray/pair.fastq \ > > > -o \ > > > metassemble/assemblies/ray/out_31 \ > > > -read-write-checkpoints \ > > > metassemble/assemblies/ray/out_31.cp \ > > > -route-messages > > > > Without other arguments, -route-messages will use a de Bruijn graph for > routing, which is not really good. > > > > What is your interconnect ? > > > > Do you have Infiniband ? > > > > > > Use this instead (the polytope is the best routing engine in RayPlatform): > > > > -route-messages -connection-type polytope -routing-graph-degree 62 > > > > (from https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt > ) > > > > > > > > For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try higher > values of k (i.e. {41..81..10}), the run > > > is exited by the scheduler after a day (max run time is one day). If I > look at the log of the stdout it seems > > > like only Rank 0 is doing something at the end. Here are are a couple of > lines from the output: > > > > > > > Well, you are using a de Bruijn graph for routing your messages. A de > Bruijn graph is theoretically cool for routing messages, > > but in practice it's very bad because it's not adaptative and it's just a > pit containing so many choke points. > > > > From the manual https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt > : > > > > -connection-type type > > Sets the connection type for routes. > > Accepted values are debruijn, hypercube, polytope, group, random, kautz > and complete. Default is debruijn. > > > > > > > > > Rank 0 is counting k-mers in sequence reads [51200001/249559758] > > > Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second > > > Estimated remaining time for this step: 11 hours, 13 minutes, 27 seconds > > > > > > This keeps going while only Rank 0 is outputting. The final message says > there are 30 minutes left for k=41. For 51 and 61 it is around 10-20h left > and for k=71 and k=81 it is about an hour again. > > > > You should definitely use the polytope. It has no choke points, and routes > are adaptative (i.e. messages between A and B will use several paths). > > > > > Does it only use Rank 0 at this step because this step can only be done > by one core or is the graph that Rank 0 contains highly complex or something? > > > > At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph. > > > > > > > > If I want to continue running Ray. Can I resume the process by running > the same parameters, but using only one core (-n 1)? Or > > > should I use more cores? > > > > You have to re-launch Ray with the same command except the -o parameter. > > > > Example: > > > > mpiexec -n 1024 Ray \ > > -k \ > > 31 \ > > -i \ > > metassemble/assemblies/ray/pair.fastq \ > > -o \ > > metassemble/assemblies/ray/out_31 \ > > -read-write-checkpoints \ > > metassemble/assemblies/ray/out_31.cp \ > > -route-messages -connection-type polytope -routing-graph-degree 62 > > > > > > > When is the checkpointing done? > > > > At each step. > > > > To see your checkpoint files: > > > > ls metassemble/assemblies/ray/out_31.cp | less > > > > > > > > > > > > It seems like I have to remove the output dir to resume from a > checkpoint. Is that correct? > > > > No. It's not necessary. You can instead provide a new output directory. > > > > > > > > Best regards, > > > Ino de Bruijn > > > > > > > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_mar > > _______________________________________________ > > Denovoassembler-users mailing list > > Denovoassembler-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users ------------------------------------------------------------------------------ Own the Future-Intel® Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users