Dear colleagues,
I am working on the assembly of a plant chromosome (about 750 Mbp long) with
lots of repeated regions.
I have a dataset of 617Mi 2x100 bp Illumina paired-end reads (1,234 Mi single
reads) and 1 Mi 510 bp average 454/Roche reads.
Because this is my first time working with this amount of data and because our
in house cluster is unavailable at the time, I will need to rent some Amazon
EC2 resources. I've never tried this before and I'm not sure where to start.
I've read the "Out Damned Spot Instance! Out I say!" post, but I am still lost.
Still, I read something interesting in the post:
"a user can share their machine images with pre-installed software".
Then my questions are:
- would a shared machine image work for me?
* If the answer is "yes", do you have a machine that could share to run with
Ray?
* if the answer is "no", do hou have/know any tutorial on how to deploy the
appropriate machine (or cluster?) to run Ray for my project?
- what would you recommend for my project:
a) a complete de novo assembly using both Illumina and 454/Roche reads, or b) a
two step assembly, first Illumina then 454/Roche with Illumina output? If the
answer is b), is this possible using Ray?
- do you know how much RAM memory would be needed to perform this assembly? or
do you know how can I estimate it?
- have you ever tried something similar in Amazon EC2? could yoy give me a cost
and/or time estimation?
- I would like the processes to finish as fast as possible but spending as less
money as possible: do you have any recommendations on how many cores I should
rent, how much memory should the server have (or how much memory is needed per
process), etc.?
- I read in Rays user manual that for very large jobs (is this project a very
large job?) routing should be enabled unless using a good interconnection: how
can I check if the rented machine is using a good interconnection? Moreover, is
there a trade-off between using lots of processes with routing enabled and
using less processes but without routing? e.g. using 56 processes without
routing would be as fast as using 64 processes with routing enabled?
I will really appreciate any help.
Sorry for the many questions.
Thank you all very much in advance.
Best regards,
Santiago
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users