Is there any website or wiki that discusses scaling and hardware requirements 
(RAM, processors, hard drives) in more detail, e.g. 
- re-sequencing (e.g. align John Doe's sequencing fastq's to the standard human 
genome/ make SAM/ make BAM and determine the variants)
- metagenomics (e.g. sequence John Doe's stool to obtain a) species by 16S or 
b) bacterial genes)
- de novo bacterial genome (e.g. isolate an unknown germ from the stool and 
assemble its genome; DNA only 90% homologous to any known bacteria, let's say 
the genome has 4Mbases)
- all for an isolated instance (one single lab) or a production instance (whole 
institute or more) would be great
 
... or could anybody give their own 2 cents?
 
Gerald

 

>________________________________
> From: Jeremy Goecks <jeremy.goe...@emory.edu>
>To: Gerald Bothe <g_bo...@yahoo.com> 
>Cc: Nikos Sidiropoulos <nikos.sid...@gmail.com>; Peter Cock 
><p.j.a.c...@googlemail.com>; "<galaxy-...@bx.psu.edu>" <galaxy-...@bx.psu.edu> 
>Sent: Thursday, September 12, 2013 10:00 AM
>Subject: Re: [galaxy-dev] Scaling and hardware requirements
>  
>
>
>This isn't an easy question to answer. Here's why:
>
>
>*there is significant variation in mammalian genome size; of course, larger 
>genomes require more resources, but the relationship is difficult to quantify;
>*assembly can take anywhere from a day to a week depending on software and 
>resource choices;
>*variant detection can take anywhere from 1-4 days depending on software used;
>*completing assembly and variant detection in 48 hours is something that is 
>challenging for even the most advanced genomics labs.
>
>
>To answer your question, I'd start with 256-512GB of RAM on a machine and 
>36-72 compute cores across a cluster. This is simply a guess of course. Before 
>investing in hardware, you might try your analysis on the cloud ( 
>usegalaxy.org/cloud ) to get a sense of the resources needed.
>
>
>Good luck,
>J.
>
>On Sep 11, 2013, at 8:34 AM, Gerald Bothe wrote:
>
>Can I put in a similar question on top of this: How much resources do you need 
>for re-sequencing of a mammalian genome (assembly and variant detection), one 
>job at a time? E.g. how much RAM  etc. if I want the re-sequencing SAM file of 
>a 30-fold coverage be done in 48 hours?
>> 
>>Gerald
>>
>>Gerald Bothe
>>32 Plum Hill Road
>>East Lyme, CT 06333
>>(860) 451 8776
>>
>>
>>
>>>________________________________
>>> From: Nikos Sidiropoulos <nikos.sid...@gmail.com>
>>>To: Peter Cock <p.j.a.c...@googlemail.com> 
>>>Cc: "<galaxy-...@bx.psu.edu>" <galaxy-...@bx.psu.edu> 
>>>Sent: Wednesday, September 11, 2013 8:19 AM
>>>Subject: Re: [galaxy-dev] Scaling and hardware requirements
>>>  
>>>
>>>
>>>Hi Peter
>>>
>>>
>>>It's going to be one big machine, running both Galaxy server and the jobs. 
>>>It's going to be a multi-process configuration. If that idea is terribly bad 
>>>please let me know so I can give back the feedback.  
>>>
>>>
>>>De novo assembly can also be for the human/mouse genome. 
>>>
>>>
>>>Bests,
>>>Nikos
>>>
>>>
>>>
>>>2013/9/11 Peter Cock <p.j.a.c...@googlemail.com>
>>>
>>>On Wed, Sep 11, 2013 at 1:03 PM, Nikos Sidiropoulos
>>>><nikos.sid...@gmail.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> I have a couple of questions regarding a server setup dedicated on Galaxy.
>>>>>
>>>>> The idea is to buy a 64 core 256GB RAM server. From my experience I 
>>>>> believe
>>>>> that Galaxy will be able to scale up to 64 cpu's but I would like some 
>>>>> more
>>>>> feedback on this. Also, is 4GB RAM per CPU core enough for NGS data?
>>>>> (including de-novo assembly)
>>>>>
>>>>> Bests,
>>>>> Nikos
>>>>
>>>>Hi Nikos,
>>>>
>>>>Is this going to be one server both for running Galaxy (which
>>>>needs fairly low resources) and running jobs for Galaxy,
>>>>like de novo assemblies (which need high resources)?
>>>>
>>>>i.e. You have one big machine only, no cluster?
>>>>
>>>>For de novo assembly the RAM per core/CPU isn't important,
>>>>it is the total RAM on the machine. How much RAM you
>>>>need depends on which assembler you use, the organism
>>>>(both size and also complexity) and the volume of data.
>>>>
>>>>What you've described should be fine for bacterial assemblies
>>>>and smaller eukaryotes - beyond that you'll need to give
>>>>more details.
>>>>
>>>>Peter
>>>>
>>> 
>>>___________________________________________________________
>>>Please keep all replies on the list by using "reply all"
>>>in your mail client.  To manage your subscriptions to this
>>>and other Galaxy lists, please use the interface at:
>>>  http://lists.bx.psu.edu/
>>>
>>>To search Galaxy mailing lists use the unified search at:
>>>  http://galaxyproject.org/search/mailinglists/
>>>
>>>    ___________________________________________________________
>>Please keep all replies on the list by using "reply all"
>>in your mail client.  To manage your subscriptions to this
>>and other Galaxy lists, please use the interface at:
>> http://lists.bx.psu.edu/
>>
>>To search Galaxy mailing lists use the unified search at:
>> http://galaxyproject.org/search/mailinglists/
>
>
>    
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to