Re: [galaxy-user] merging fastq files

Jennifer Jackson Wed, 10 Apr 2013 15:23:36 -0700

Hi Andrew,

We do not have any estimates posted on the wiki currently for exampleusage on the cloud, but this is a good idea and the team is discussingthe best way to add in something like this. The difficulty comes withhow variable actual job run-times can be, but there still are some waysto break this down.

These examples are based on how *long* an instance would be up andcenter on two primary costs: the type of instance and the size of theEBS volume. The details at Amazon are on this link:aws.amazon.com/ec2/pricing <http://aws.amazon.com/ec2/pricing>

1 extra large high memory instance capable of RNA-seq w/ 200GB storage:$25/day.

 + 1 worker instance, $10/day/each.

1 basic instance capable of general text manipulation w/ 50GB storage:$10/day

I am not sure if you will be using GATK or SAM Tools for yourprocessing, but running any variant analysis pipeline would be somewhatsimilar to an RNA-seq pipeline since it would involve mapping, largedata file manipulations, etc. For you particular case, the data storagewould be larger than the estimate above, so using the table at Amazonshould help you to calculate a figure that reflects your storage needs.It is difficult to say how long any job will run purely based on thesize of the inputs, as content and parameter settings have a significanteffect on run time, but after the first job, or first time through acomplete workflow, if the data is somewhat homogenous, you may be ableto estimate a total from there for future runs. Although I or almostanyone else can tell you that these sorts of experiments can pop outwith surprises now and then!

Others on the list using a cloud are welcomed to post comments to thisthread. Once we get the initial wiki table posted, it will be open tocommunity input, so that this type of actual usage data can be captured.If you or anyone else also wants to send back results meanwhile (post tothread and/or ticket, with experiment & instance detail) please do, hereis the new development -> https://trello.com/c/pMbri7QI

Hopefully this helps a little bit! Apologies for not being able to givemore detail, this is a tough question to answer with precision for acomplete workflow! A pool of case examples is probably the best way toget a bead on this data, so that's part of the goal now.


Jen
Galaxy team

On 4/10/13 9:20 AM, Thompson, Andrew wrote:

Dear Jen.
Yes, that was my problem, I skipped some steps by relying too much on the 
screencast and ignoring the text.

Now I am reluctant to launch the AMI as I having trouble estimating my usage 
and costs on AWS - as a new user I have little
idea what to set many of the parameters in the usage calculator. Are there any 
examples of typical parameters and costs for running Galaxy on the cloud?
My first task is to map about 80 gbp of total paired end reads from genomic DNA 
from two accessions to a 900 mbp reference genome and
then find SNPs and INDELs. A ball-park figure would be reassuring!

regards
   Andrew
________________________________
From: Jennifer Jackson [[email protected]]
Sent: 09 April 2013 15:11
To: Thompson, Andrew
Cc: '[email protected]'
Subject: Re: [galaxy-user] merging fastq files

Hi Andrew,

My first guess is that perhaps the region is not set correctly?
http://wiki.galaxyproject.org/CloudMan/AWS/GettingStarted

See " Step 1: One Time Amazon Setup", subsection 2, where region is " set your AWS 
Region to US East (Virginia)".
The image in the wiki for step 1.2 is slightly outdated, instead it will look 
like this:

[cid:[email protected]]

Please give this a try and let us know if you continue to have issues.

Thanks!

Jen
Galaxy team

On 4/8/13 3:56 PM, Thompson, Andrew wrote:

Dear Jen
Thanks. I have merged the files and end up with 4 x 47 G fastq files for read 
mapping to the reference.
It seems this is too much data to analyse on the public main instance if the 
size limit is 250 G?
So I tried to set up the cloud option following the screencast 
(http://screencast.g2.bx.psu.edu/cloud/), but when I search for the current AMI 
name (861460482541/galaxy-cloudman-2011-03-22) it is not found in the list of 
community AMIs under Amazon's EC2 Management Console. Any ideas why this is not 
working?
regards
    Andrew




--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] merging fastq files

Reply via email to