On 25/03/13 05:32 AM, Ino de Bruijn wrote: > Dear Sébastien Boisvert, > > I am trying to assemble a paired-end Illumina Hiseq library of about 1 > billion reads. I ran Ray with: > > mpiexec -n 1024 Ray \ > -k \ > 31 \ > -i \ > metassemble/assemblies/ray/pair.fastq \ > -o \ > metassemble/assemblies/ray/out_31 \ > -read-write-checkpoints \ > metassemble/assemblies/ray/out_31.cp \ > -route-messages
Without other arguments, -route-messages will use a de Bruijn graph for routing, which is not really good. What is your interconnect ? Do you have Infiniband ? Use this instead (the polytope is the best routing engine in RayPlatform): -route-messages -connection-type polytope -routing-graph-degree 62 (from https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt ) > > For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try higher values > of k (i.e. {41..81..10}), the run > is exited by the scheduler after a day (max run time is one day). If I look > at the log of the stdout it seems > like only Rank 0 is doing something at the end. Here are are a couple of > lines from the output: > Well, you are using a de Bruijn graph for routing your messages. A de Bruijn graph is theoretically cool for routing messages, but in practice it's very bad because it's not adaptative and it's just a pit containing so many choke points. From the manual https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt : -connection-type type Sets the connection type for routes. Accepted values are debruijn, hypercube, polytope, group, random, kautz and complete. Default is debruijn. > Rank 0 is counting k-mers in sequence reads [51200001/249559758] > Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second > Estimated remaining time for this step: 11 hours, 13 minutes, 27 seconds > > This keeps going while only Rank 0 is outputting. The final message says > there are 30 minutes left for k=41. For 51 and 61 it is around 10-20h left > and for k=71 and k=81 it is about an hour again. You should definitely use the polytope. It has no choke points, and routes are adaptative (i.e. messages between A and B will use several paths). > Does it only use Rank 0 at this step because this step can only be done by > one core or is the graph that Rank 0 contains highly complex or something? At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph. > > If I want to continue running Ray. Can I resume the process by running the > same parameters, but using only one core (-n 1)? Or > should I use more cores? You have to re-launch Ray with the same command except the -o parameter. Example: mpiexec -n 1024 Ray \ -k \ 31 \ -i \ metassemble/assemblies/ray/pair.fastq \ -o \ metassemble/assemblies/ray/out_31 \ -read-write-checkpoints \ metassemble/assemblies/ray/out_31.cp \ -route-messages -connection-type polytope -routing-graph-degree 62 > When is the checkpointing done? At each step. To see your checkpoint files: ls metassemble/assemblies/ray/out_31.cp | less > > It seems like I have to remove the output dir to resume from a checkpoint. Is > that correct? No. It's not necessary. You can instead provide a new output directory. > > Best regards, > Ino de Bruijn ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users