Hello,

I'm trying to do a large co-assembly of 10 metagenomes using Ray (I know
that running them individually would be faster; but for our purposes we
want to do a co-assembly if possible). I have overlapping paired-end reads
that have been merged and filtered, yielding a total of about 632 million
single-end reads of about 150bp in length for all 10 samples. I have tried
running Ray on this dataset using n = 80 and k = 31 on a cluster with 40
(80 hyperthread) cores and 256 GB memory, and according to the log there
are at least 35 days remaining for the k-mer counting step. The network
testing step reported an average node latency of 205.6 and a standard
deviation of 63.9002 for all ranks.

So my question is: is this kind of time frame normal for a dataset of this
size and on this kind of computer network? Running for over 30 days for a
dataset that's about 600 million non-paired reads seems like a lot. Am I
doing something wrong, or is there anything I can do to reduce the running
time? We have another general-use cluster with 32 cores and 1024 GB memory;
would running it on that cluster work better because of higher memory
availability?

Thanks for any advice!
Best,
Rika Anderson


-- 
Rika Anderson, Ph.D.
NASA Postdoctoral Fellow
NASA Astrobiology Institute
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to