Dear Sébastien Boisvert,

I am trying to assemble a paired-end Illumina Hiseq library of about 1 billion 
reads. I ran Ray with:

mpiexec -n 1024 Ray \
 -k \
 31 \
 -i \
 metassemble/assemblies/ray/pair.fastq \
 -o \
 metassemble/assemblies/ray/out_31 \
 -read-write-checkpoints \
 metassemble/assemblies/ray/out_31.cp \
 -route-messages

For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try higher values of 
k (i.e. {41..81..10}), the run
is exited by the scheduler after a day (max run time is one day). If I look at 
the log of the stdout it seems
like only Rank 0 is doing something at the end. Here are are a couple of lines 
from the output:

Rank 0 is counting k-mers in sequence reads [51200001/249559758]
Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second
Estimated remaining time for this step: 11 hours, 13 minutes, 27 seconds

This keeps going while only Rank 0 is outputting. The final message says there 
are 30 minutes left for k=41. For 51 and 61 it is around 10-20h left and for 
k=71 and k=81 it is about an hour again.
Does it only use Rank 0 at this step because this step can only be done by one 
core or is the graph that Rank 0 contains highly complex or something?

If I want to continue running Ray. Can I resume the process by running the same 
parameters, but using only one core (-n 1)? Or
should I use more cores? When is the checkpointing done?

It seems like I have to remove the output dir to resume from a checkpoint. Is 
that correct?

Best regards,
Ino de Bruijn
                                          
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to