On 25/03/13 05:32 AM, Ino de Bruijn wrote:
> Dear Sébastien Boisvert,
>
> I am trying to assemble a paired-end Illumina Hiseq library of about 1
> billion reads. I ran Ray with:
>
> mpiexec -n 1024 Ray \
> -k \
> 31 \
> -i \
> metassemble/assemblies/ray/pair.fastq \
> -o \
> metassemble/assemblies/ray/out_31 \
> -read-write-checkpoints \
> metassemble/assemblies/ray/out_31.cp \
> -route-messages
Without other arguments, -route-messages will use a de Bruijn graph for
routing, which is not really good.
What is your interconnect ?
Do you have Infiniband ?
Use this instead (the polytope is the best routing engine in RayPlatform):
-route-messages -connection-type polytope -routing-graph-degree 62
(from https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt )
>
> For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try higher values
> of k (i.e. {41..81..10}), the run
> is exited by the scheduler after a day (max run time is one day). If I look
> at the log of the stdout it seems
> like only Rank 0 is doing something at the end. Here are are a couple of
> lines from the output:
>
Well, you are using a de Bruijn graph for routing your messages. A de Bruijn
graph is theoretically cool for routing messages,
but in practice it's very bad because it's not adaptative and it's just a pit
containing so many choke points.
From the manual https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt :
-connection-type type
Sets the connection type for routes.
Accepted values are debruijn, hypercube, polytope, group,
random, kautz and complete. Default is debruijn.
> Rank 0 is counting k-mers in sequence reads [51200001/249559758]
> Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second
> Estimated remaining time for this step: 11 hours, 13 minutes, 27 seconds
>
> This keeps going while only Rank 0 is outputting. The final message says
> there are 30 minutes left for k=41. For 51 and 61 it is around 10-20h left
> and for k=71 and k=81 it is about an hour again.
You should definitely use the polytope. It has no choke points, and routes are
adaptative (i.e. messages between A and B will use several paths).
> Does it only use Rank 0 at this step because this step can only be done by
> one core or is the graph that Rank 0 contains highly complex or something?
At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph.
>
> If I want to continue running Ray. Can I resume the process by running the
> same parameters, but using only one core (-n 1)? Or
> should I use more cores?
You have to re-launch Ray with the same command except the -o parameter.
Example:
mpiexec -n 1024 Ray \
-k \
31 \
-i \
metassemble/assemblies/ray/pair.fastq \
-o \
metassemble/assemblies/ray/out_31 \
-read-write-checkpoints \
metassemble/assemblies/ray/out_31.cp \
-route-messages -connection-type polytope -routing-graph-degree 62
> When is the checkpointing done?
At each step.
To see your checkpoint files:
ls metassemble/assemblies/ray/out_31.cp | less
>
> It seems like I have to remove the output dir to resume from a checkpoint. Is
> that correct?
No. It's not necessary. You can instead provide a new output directory.
>
> Best regards,
> Ino de Bruijn
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users