On 25/03/13 05:32 AM, Ino de Bruijn wrote:
> Dear Sébastien Boisvert,
>
> I am trying to assemble a paired-end Illumina Hiseq library of about 1 
> billion reads. I ran Ray with:
>
> mpiexec -n 1024 Ray \
>   -k \
>   31 \
>   -i \
>   metassemble/assemblies/ray/pair.fastq \
>   -o \
>   metassemble/assemblies/ray/out_31 \
>   -read-write-checkpoints \
>   metassemble/assemblies/ray/out_31.cp \
>   -route-messages

Without other arguments, -route-messages will use a de Bruijn graph for 
routing, which is not really good.

What is your interconnect ?

Do you have Infiniband ?


Use this instead (the polytope is the best routing engine in RayPlatform):

-route-messages -connection-type polytope -routing-graph-degree 62

    (from https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt )

>
> For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try higher values 
> of k (i.e. {41..81..10}), the run
> is exited by the scheduler after a day (max run time is one day). If I look 
> at the log of the stdout it seems
> like only Rank 0 is doing something at the end. Here are are a couple of 
> lines from the output:
>

Well, you are using a de Bruijn graph for routing your messages. A de Bruijn 
graph is theoretically cool for routing messages,
but in practice it's very bad because it's not adaptative and it's just a pit 
containing so many choke points.

 From the manual https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt :

        -connection-type type
               Sets the connection type for routes.
               Accepted values are debruijn, hypercube, polytope, group, 
random, kautz and complete. Default is debruijn.



> Rank 0 is counting k-mers in sequence reads [51200001/249559758]
> Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second
> Estimated remaining time for this step: 11 hours, 13 minutes, 27 seconds
>
> This keeps going while only Rank 0 is outputting. The final message says 
> there are 30 minutes left for k=41. For 51 and 61 it is around 10-20h left 
> and for k=71 and k=81 it is about an hour again.

You should definitely use the polytope. It has no choke points, and routes are 
adaptative (i.e. messages between A and B will use several paths).

> Does it only use Rank 0 at this step because this step can only be done by 
> one core or is the graph that Rank 0 contains highly complex or something?

At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph.

>
> If I want to continue running Ray. Can I resume the process by running the 
> same parameters, but using only one core (-n 1)? Or
> should I use more cores?

You have to re-launch Ray with the same command except the -o parameter.

Example:

  mpiexec -n 1024 Ray \
    -k \
    31 \
    -i \
    metassemble/assemblies/ray/pair.fastq \
    -o \
    metassemble/assemblies/ray/out_31 \
   -read-write-checkpoints \
    metassemble/assemblies/ray/out_31.cp \
-route-messages -connection-type polytope -routing-graph-degree 62


> When is the checkpointing done?

At each step.

To see your checkpoint files:

ls metassemble/assemblies/ray/out_31.cp | less



>
> It seems like I have to remove the output dir to resume from a checkpoint. Is 
> that correct?

No. It's not necessary. You can instead provide a new output directory.

>
> Best regards,
> Ino de Bruijn




------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to