Dear Sébastien Boisvert,
I am trying to assemble a paired-end Illumina Hiseq library of about 1 billion
reads. I ran Ray with:
mpiexec -n 1024 Ray \
-k \
31 \
-i \
metassemble/assemblies/ray/pair.fastq \
-o \
metassemble/assemblies/ray/out_31 \
-read-write-checkpoints \
metassemble/assemblies/ray/out_31.cp \
-route-messages
For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try higher values of
k (i.e. {41..81..10}), the run
is exited by the scheduler after a day (max run time is one day). If I look at
the log of the stdout it seems
like only Rank 0 is doing something at the end. Here are are a couple of lines
from the output:
Rank 0 is counting k-mers in sequence reads [51200001/249559758]
Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second
Estimated remaining time for this step: 11 hours, 13 minutes, 27 seconds
This keeps going while only Rank 0 is outputting. The final message says there
are 30 minutes left for k=41. For 51 and 61 it is around 10-20h left and for
k=71 and k=81 it is about an hour again.
Does it only use Rank 0 at this step because this step can only be done by one
core or is the graph that Rank 0 contains highly complex or something?
If I want to continue running Ray. Can I resume the process by running the same
parameters, but using only one core (-n 1)? Or
should I use more cores? When is the checkpointing done?
It seems like I have to remove the output dir to resume from a checkpoint. Is
that correct?
Best regards,
Ino de Bruijn
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users