On 25/03/13 12:46 PM, Ino de Bruijn wrote:
> Thanks a lot for the prompt reply!
>
>  > What is your interconnect ?
>  >
>  > Do you have Infiniband ?
>
> It is a Cray XE6 system that uses the Cray Gemini interconnect technology. 
> These are the full specs: 
> http://www.pdc.kth.se/resources/computers/lindgren/hardware

In that case, you don't need to route your messages !

The Cray XE6 has the best interconnect out there ! It's a 5D torus.

>
> Is the polytope connection type good for this type of interconnect as well?
>

You should remove the -route-messages option altogether.


-route-messages is useful when using buggy Infiniband fabrics or TCP networks. 
If you use something like:

* Cray XE6
* IBM Blue Gene/Q
* Intel PSM (QLogic Infiniband)
* IBM iDataPlex


you don't need this option because these systems are really good and provide 
low-latency any-to-any message passing.


> Best regards,
> Ino
>
>  > Date: Mon, 25 Mar 2013 11:24:21 -0400
>  > From: sebastien.boisver...@ulaval.ca
>  > To: denovoassembler-users@lists.sourceforge.net
>  > Subject: Re: [Denovoassembler-users] Long execution time, seems to be 
> stuck at Rank 0
>  >
>  > On 25/03/13 05:32 AM, Ino de Bruijn wrote:
>  > > Dear Sébastien Boisvert,
>  > >
>  > > I am trying to assemble a paired-end Illumina Hiseq library of about 1 
> billion reads. I ran Ray with:
>  > >
>  > > mpiexec -n 1024 Ray \
>  > > -k \
>  > > 31 \
>  > > -i \
>  > > metassemble/assemblies/ray/pair.fastq \
>  > > -o \
>  > > metassemble/assemblies/ray/out_31 \
>  > > -read-write-checkpoints \
>  > > metassemble/assemblies/ray/out_31.cp \
>  > > -route-messages
>  >
>  > Without other arguments, -route-messages will use a de Bruijn graph for 
> routing, which is not really good.
>  >
>  > What is your interconnect ?
>  >
>  > Do you have Infiniband ?
>  >
>  >
>  > Use this instead (the polytope is the best routing engine in RayPlatform):
>  >
>  > -route-messages -connection-type polytope -routing-graph-degree 62
>  >
>  > (from https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt 
> )
>  >
>  > >
>  > > For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try higher 
> values of k (i.e. {41..81..10}), the run
>  > > is exited by the scheduler after a day (max run time is one day). If I 
> look at the log of the stdout it seems
>  > > like only Rank 0 is doing something at the end. Here are are a couple of 
> lines from the output:
>  > >
>  >
>  > Well, you are using a de Bruijn graph for routing your messages. A de 
> Bruijn graph is theoretically cool for routing messages,
>  > but in practice it's very bad because it's not adaptative and it's just a 
> pit containing so many choke points.
>  >
>  > From the manual https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt 
> :
>  >
>  > -connection-type type
>  > Sets the connection type for routes.
>  > Accepted values are debruijn, hypercube, polytope, group, random, kautz 
> and complete. Default is debruijn.
>  >
>  >
>  >
>  > > Rank 0 is counting k-mers in sequence reads [51200001/249559758]
>  > > Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second
>  > > Estimated remaining time for this step: 11 hours, 13 minutes, 27 seconds
>  > >
>  > > This keeps going while only Rank 0 is outputting. The final message says 
> there are 30 minutes left for k=41. For 51 and 61 it is around 10-20h left 
> and for k=71 and k=81 it is about an hour again.
>  >
>  > You should definitely use the polytope. It has no choke points, and routes 
> are adaptative (i.e. messages between A and B will use several paths).
>  >
>  > > Does it only use Rank 0 at this step because this step can only be done 
> by one core or is the graph that Rank 0 contains highly complex or something?
>  >
>  > At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph.
>  >
>  > >
>  > > If I want to continue running Ray. Can I resume the process by running 
> the same parameters, but using only one core (-n 1)? Or
>  > > should I use more cores?
>  >
>  > You have to re-launch Ray with the same command except the -o parameter.
>  >
>  > Example:
>  >
>  > mpiexec -n 1024 Ray \
>  > -k \
>  > 31 \
>  > -i \
>  > metassemble/assemblies/ray/pair.fastq \
>  > -o \
>  > metassemble/assemblies/ray/out_31 \
>  > -read-write-checkpoints \
>  > metassemble/assemblies/ray/out_31.cp \
>  > -route-messages -connection-type polytope -routing-graph-degree 62
>  >
>  >
>  > > When is the checkpointing done?
>  >
>  > At each step.
>  >
>  > To see your checkpoint files:
>  >
>  > ls metassemble/assemblies/ray/out_31.cp | less
>  >
>  >
>  >
>  > >
>  > > It seems like I have to remove the output dir to resume from a 
> checkpoint. Is that correct?
>  >
>  > No. It's not necessary. You can instead provide a new output directory.
>  >
>  > >
>  > > Best regards,
>  > > Ino de Bruijn
>  >
>  >
>  >
>  >
>  > 
> ------------------------------------------------------------------------------
>  > Everyone hates slow websites. So do we.
>  > Make your web apps faster with AppDynamics
>  > Download AppDynamics Lite for free today:
>  > http://p.sf.net/sfu/appdyn_d2d_mar
>  > _______________________________________________
>  > Denovoassembler-users mailing list
>  > Denovoassembler-users@lists.sourceforge.net
>  > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users


------------------------------------------------------------------------------
Own the Future-Intel® Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest.
Compete for recognition, cash, and the chance to get your game 
on Steam. $5K grand prize plus 10 genre and skill prizes. 
Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to